BookmarkSubscribeRSS Feed
udithagalgamuwa
Calcite | Level 5

Hi,

I am developing some logistic regression models to evaluate the safety effectiveness of roadway countermeasures. What I am really interested is not the predictions but to findout the effect of explanatory variable towards the response variable (Crashes). I can model crashes as the response variable and other factors as the explanatory variables using either binomial or multinomial logistic regression models.

What I need to know is how to find the better model (multinomial or binomial) which is most suitable for the given dataset?

 

Thank you

9 REPLIES 9
Reeza
Super User

Do you only have two outcome variables? AUC or ROC and VIF may be of interest. 

 

If the outcome is a count, wouldn't this be a Poisson regression? 

udithagalgamuwa
Calcite | Level 5

Hi, Reeza,

Thank you so much for your quick reply.

Yes, it is a count and Poisson models or Generalized Mixed Models would be one of the approaches to predict the count. But in my area of research, we use logistic regression to develop case-control models. Following I provide the brief description of my data and how we analyze them.

 

Original dataset (This is used to develop multinomial logistic regression models)

Road SegmentNumber of CrashesDaily Traffic (number of vehicles)Segment Length (miles)
101000.5
2010500.8
3121001.2
4225002.1
509500.6
6330002.3
7224001.8
8118001.1
9011501
10445002.9
11546002.9
1208000.35

 

For the binomial logistic regression, we assign 1 for the crash segments which are not 0 as follows

Road SegmentNumber of CrashesDaily Traffic (number of vehicles)Segment Length (miles)
101000.5
2010500.8
3121001.2
4125002.1
509500.6
6130002.3
7124001.8
8118001.1
9011501
10145002.9
11146002.9
1208000.35

 

What I need to know is after we develop models using these two methods, how to find the model which has the better fit toward the dataset. 

 

Thanks

StatDave
SAS Super FREQ

If that is the entire set of data available, then I don't think you will be able to fit multinomial (whether nominal or ordinal - you didn't specify) since the data are just too sparse. Even with the binary version of the response there is sparseness problems for a model with just those two predictors (vehicles, miles). In that case, the FIRTH option can be used to use a penalized likelihood resulting in finite parameter estimates (and both predictors are nonsignificant). A Poisson model can be fit in GENMOD - again, both predictors nonsignificant. If you have more data so that you can successfully fit the various models of interest to the number of crashes response, then you could use the Vuong test to compare pairs of strictly nonnested models. 

udithagalgamuwa
Calcite | Level 5

Thank you for your valuable suggestion.

This is an Ordinal dataset. Here I have mentioned only the fraction of my dataset. In the original dataset, I have more than 20,000 data rows. I have developed both binomial and multinomial regression models using "proc logistic" and the both models are significant. But I need to select one model which has a better prediction power and which is most suitable for the given dataset.

 

I cannot use Vuong test because this test requires that both models are fit using exactly the same set of response values. However, my response variables are not the same. 

 

 

 

 

 

StatDave
SAS Super FREQ

The Vuong test does not require different response variables, it just requires that the models be nonnested. In fact, the VUONG macro requires the models being compared to have the same response.

Reeza
Super User

AUC - area under the curve is considered one measure that's suitable to comparing the accuracy of the model. 

 

For a 2x2 table you can also look at the specificity, sensitivity measures. 

If there's any clinical significance looking at the numbers needed to treat to detect is also a good measures. 

udithagalgamuwa
Calcite | Level 5

HI, Reeza,

Thank you for your valuable suggestion.

 

I have used 2*2 tables to calculate specificity, sensitivity, and accuracy of the binary logistic regression models. But I am not sure how to use them for the multinomial regression models (my aim is to compare binomial model with multinomial regression model)

 

I have looked into AUC, do you have any suggestions where I can find more details of how to use that method in SAS/STAT basic version. Is ROC the same as AUC?

 

Thankx

 

StatDave
SAS Super FREQ

ROC analysis (producing an ROC curve and computing the AUC - the area underneath the curve) applies only to a binary model, not the multinomial. 

udithagalgamuwa
Calcite | Level 5

Do you have any suggestions of comparing binomial logistic regression models with multinomial logistic regression models and to find which one is the better?

Thanks

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1675 views
  • 0 likes
  • 3 in conversation