This question is an overlap of methodology and SAS programming - hopefully it fits here ...
I wish to build a predictive model with explanatory variables that have different types of missing values.
e.g. (this is made up)
Response Variable: Primary policy holder of a current insurance policy purchases an additional insurance benefit (add-on)
Explanatory Variable 1: Number of customers on policy (no missing values)
Explanatory Variable 2: How the current policy purchased (sales channel) (Missing values = unknown)
Explanatory Variable 3: Country of origin (can include missing values. Missing values = unknown)
Explanatory Variable 4: How many claims has the customer made (0+) (no missing values)
Explanatory Variable 5: Maximum settlement time of claims made (if missing, this is becaue no claims were made)
Explanatory Variable 6: Maximum claim amount (if missing, this is because no claims were made)
Is there a way to distinguish the "missing value" in explanatory variables 5 and 6 (because it is not applicable) as distinct to the missing values in explanatory variables 2 and 3?
Effectively, I want to consider explanatory variable 5 missing category as a category but those in variables 2 and 3 as missing.
My first step was to use hpsplit to gauge what interactions to include in a logistic regression model (as per paper: Methods for Interaction Detection in Predictive Modeling Using SAS ) using hpsplit. I see that SAS has special missing values (.a - .z) however it doesn't seem that hpsplit treats them differently. It seems there is a blanket treatment for all missing values via assignmissing=BRANCH|NONE|POPULAR|SIMILAR (from SAS/STAT® 14.1 User’s Guide The HPSPLIT Procedure)
Any suggestions would be greatly appreciated on how to handle such missing values / interrelated variables.
Thanks
How about you use a value like 0 or -1 for the missing values you want to use?
How about you use a value like 0 or -1 for the missing values you want to use?
Thank-you! Yes, I will use a negative value (this website had told me that my submission of this question had been unsuccessful and so - so I was surprised to receive this reponse here! ...)
Edit: I should also note that for regression I made sure to include interaction terms for variables 4 and 5 and 4 and 6.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.