BookmarkSubscribeRSS Feed
juanvg1972
Pyrite | Level 9

Hi,

 

I am working in a logistic regression using 'proc logistic'. One of the input variables is 'age' (from 18 to 78). I have created a new var 'age_levels' that is the result of making ranges from 'age' using a clustering, it has 4 levels.

 

Using  proc freq I can see that there is a dependency between 'age_levels' and the target var of the model.

My doubt is wheter var I have to use in the model of proc logistic , 'age' or 'age_levels' as input var.

 

Can anybody help me??

 

Thanks

2 REPLIES 2
PaigeMiller
Diamond | Level 26

You can use any variable(s) that you think are appropriate in such a model.

 

I usually avoid clustering of input variables such as age into four levels. This seems to me to throw away potentially useful information that is contained in age.

--
Paige Miller
Reeza
Super User
When you use age as a categorical variable you're saying that the boundaries reflect some grouping. So if it makes sense that a 41 year old has different outcomes than a 39 year old, assuming one age cut off is 40, then use the categories. Otherwise the general consensus is don't group the variables unless there's some reason to do so.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1290 views
  • 2 likes
  • 3 in conversation