BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Quartz | Level 8

This question is an overlap of methodology and SAS programming - hopefully it fits here ... 


I wish to build a predictive model with explanatory variables that have different types of missing values.


e.g. (this is made up)

Response Variable: Primary policy holder of a current insurance policy purchases an additional insurance benefit (add-on)

Explanatory Variable 1: Number of customers on policy (no missing values)

Explanatory Variable 2: How the current policy purchased (sales channel) (Missing values = unknown)

Explanatory Variable 3: Country of origin (can include missing values.  Missing values = unknown)

Explanatory Variable 4: How many claims has the customer made (0+) (no missing values)

Explanatory Variable 5: Maximum settlement time of claims made (if missing, this is becaue no claims were made)  

Explanatory Variable 6: Maximum claim amount (if missing, this is because no claims were made)


Is there a way to distinguish the "missing value" in explanatory variables 5 and 6 (because it is not applicable) as distinct to the missing values in explanatory variables 2 and 3?


Effectively, I want to consider explanatory variable 5 missing category as a category but those in variables 2 and 3 as missing.


My first step was to use hpsplit to gauge what interactions to include in a logistic regression model (as per paper: Methods for Interaction Detection in Predictive Modeling Using SAS ) using hpsplit.  I see that SAS has special missing values (.a - .z) however it doesn't seem that hpsplit treats them differently.    It seems there is a blanket treatment for all missing values via assignmissing=BRANCH|NONE|POPULAR|SIMILAR (from SAS/STAT® 14.1 User’s Guide The HPSPLIT Procedure)


Any suggestions would be greatly appreciated on how to handle such missing values / interrelated variables.




Accepted Solutions
Tourmaline | Level 20

How about you use a value like 0 or -1 for the missing values you want to use?

View solution in original post

Tourmaline | Level 20

How about you use a value like 0 or -1 for the missing values you want to use?

Quartz | Level 8

Thank-you!  Yes, I will use a negative value (this website had told me that my submission of this question had been unsuccessful and so  - so I was surprised to receive this reponse here! ...)  


Edit: I should also note that for regression I made sure to include interaction terms for variables 4 and 5 and 4 and 6.


Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 2 in conversation