BookmarkSubscribeRSS Feed
Shivi82
Quartz | Level 8

Hi.

During the Modelling process, if we have selected some of the variables and we see that for one of the very significant variable the values are missing so is there a threshold or industry specified % for the variable to be kept in the model and ignore the missing values.

I understand that we can replace the missing values with either the mean or the median of the variable however wanted to see if statisitcally there is a threshold.

Regards, Shivi

1 REPLY 1
BruceBrad
Lapis Lazuli | Level 10

There is no agreed threshold. If you are doing a regression-type model, and the missing value is a RHS variable, then one common work-around is to add 'missing' as a separate variable.

More specifically, if your missing variable is categorical, you will be adding in dummy variables for each category. You can then just have another category 'missing'.

If the missing varialbe is continuous, then you can code missing values to some arbitrary value (eg zero) and also include an additional dummy variable equal to one if the variable is missing.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 926 views
  • 0 likes
  • 2 in conversation