BookmarkSubscribeRSS Feed
Shivi82
Quartz | Level 8

Hi.

During the Modelling process, if we have selected some of the variables and we see that for one of the very significant variable the values are missing so is there a threshold or industry specified % for the variable to be kept in the model and ignore the missing values.

I understand that we can replace the missing values with either the mean or the median of the variable however wanted to see if statisitcally there is a threshold.

Regards, Shivi

1 REPLY 1
BruceBrad
Lapis Lazuli | Level 10

There is no agreed threshold. If you are doing a regression-type model, and the missing value is a RHS variable, then one common work-around is to add 'missing' as a separate variable.

More specifically, if your missing variable is categorical, you will be adding in dummy variables for each category. You can then just have another category 'missing'.

If the missing varialbe is continuous, then you can code missing values to some arbitrary value (eg zero) and also include an additional dummy variable equal to one if the variable is missing.

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 986 views
  • 0 likes
  • 2 in conversation