BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BTAinRVA
Quartz | Level 8

Good morning all!

 

I'm doing logistic regression modeling with 7 categorical predictor variables and it seems everything is coming up significant and is being included in the final model. Is this simply due to having a rather large sample size (50k+)?

 

Thanks,

Brian

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

That's actually 4+5+5+1+14+1+8+4 = 42 variables but still with 50,000 rows you're likely fine. Assuming no interaction. I would be adding confidence intervals to my estimates and seeing if the effects are large enough to matter.

View solution in original post

5 REPLIES 5
Reeza
Super User
How many levels in each category? You're likely overpowered, which means you need to start looking at effect sizes to determine if it's practically significant.
BTAinRVA
Quartz | Level 8

Reeza,

 

Thanks for the reply! The eight variables have 5, 6, 6, 2, 15, 2, 9, and 4 levels.

 

Brian

Reeza
Super User

That's actually 4+5+5+1+14+1+8+4 = 42 variables but still with 50,000 rows you're likely fine. Assuming no interaction. I would be adding confidence intervals to my estimates and seeing if the effects are large enough to matter.

StatDave
SAS Super FREQ

If you are using a WEIGHT statement, this is quite common unless you also use the NORMALIZE option in that statement.

BTAinRVA
Quartz | Level 8

StatDave,

 

Thanks for the reply. I'm not using the weight statement as my data is not aggregated. I have one line per person and a response variable, comply, that is 0 or 1depending on if the person is in compliance.

 

Brian

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1325 views
  • 3 likes
  • 3 in conversation