BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mduarte
Quartz | Level 8

This question is an overlap of methodology and SAS programming - hopefully it fits here ... 

 

I wish to build a predictive model with explanatory variables that have different types of missing values.

 

e.g. (this is made up)

Response Variable: Primary policy holder of a current insurance policy purchases an additional insurance benefit (add-on)

Explanatory Variable 1: Number of customers on policy (no missing values)

Explanatory Variable 2: How the current policy purchased (sales channel) (Missing values = unknown)

Explanatory Variable 3: Country of origin (can include missing values.  Missing values = unknown)

Explanatory Variable 4: How many claims has the customer made (0+) (no missing values)

Explanatory Variable 5: Maximum settlement time of claims made (if missing, this is becaue no claims were made)  

Explanatory Variable 6: Maximum claim amount (if missing, this is because no claims were made)

 

Is there a way to distinguish the "missing value" in explanatory variables 5 and 6 (because it is not applicable) as distinct to the missing values in explanatory variables 2 and 3?

 

Effectively, I want to consider explanatory variable 5 missing category as a category but those in variables 2 and 3 as missing.

 

My first step was to use hpsplit to gauge what interactions to include in a logistic regression model (as per paper: Methods for Interaction Detection in Predictive Modeling Using SAS ) using hpsplit.  I see that SAS has special missing values (.a - .z) however it doesn't seem that hpsplit treats them differently.    It seems there is a blanket treatment for all missing values via assignmissing=BRANCH|NONE|POPULAR|SIMILAR (from SAS/STAT® 14.1 User’s Guide The HPSPLIT Procedure)

 

Any suggestions would be greatly appreciated on how to handle such missing values / interrelated variables.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

How about you use a value like 0 or -1 for the missing values you want to use?

View solution in original post

2 REPLIES 2
ChrisNZ
Tourmaline | Level 20

How about you use a value like 0 or -1 for the missing values you want to use?

mduarte
Quartz | Level 8

Thank-you!  Yes, I will use a negative value (this website had told me that my submission of this question had been unsuccessful and so  - so I was surprised to receive this reponse here! ...)  

 

Edit: I should also note that for regression I made sure to include interaction terms for variables 4 and 5 and 4 and 6.

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1420 views
  • 0 likes
  • 2 in conversation