DATA Step, Macro, Functions and more

predictive modelling / handling missing values

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 84
Accepted Solution

predictive modelling / handling missing values

This question is an overlap of methodology and SAS programming - hopefully it fits here ... 

 

I wish to build a predictive model with explanatory variables that have different types of missing values.

 

e.g. (this is made up)

Response Variable: Primary policy holder of a current insurance policy purchases an additional insurance benefit (add-on)

Explanatory Variable 1: Number of customers on policy (no missing values)

Explanatory Variable 2: How the current policy purchased (sales channel) (Missing values = unknown)

Explanatory Variable 3: Country of origin (can include missing values.  Missing values = unknown)

Explanatory Variable 4: How many claims has the customer made (0+) (no missing values)

Explanatory Variable 5: Maximum settlement time of claims made (if missing, this is becaue no claims were made)  

Explanatory Variable 6: Maximum claim amount (if missing, this is because no claims were made)

 

Is there a way to distinguish the "missing value" in explanatory variables 5 and 6 (because it is not applicable) as distinct to the missing values in explanatory variables 2 and 3?

 

Effectively, I want to consider explanatory variable 5 missing category as a category but those in variables 2 and 3 as missing.

 

My first step was to use hpsplit to gauge what interactions to include in a logistic regression model (as per paper: Methods for Interaction Detection in Predictive Modeling Using SAS ) using hpsplit.  I see that SAS has special missing values (.a - .z) however it doesn't seem that hpsplit treats them differently.    It seems there is a blanket treatment for all missing values via assignmissing=BRANCH|NONE|POPULAR|SIMILAR (from SAS/STAT® 14.1 User’s Guide The HPSPLIT Procedure)

 

Any suggestions would be greatly appreciated on how to handle such missing values / interrelated variables.

 

Thanks


Accepted Solutions
Solution
‎04-19-2016 07:01 PM
PROC Star
Posts: 1,760

Re: predictive modelling / handling missing values

How about you use a value like 0 or -1 for the missing values you want to use?

View solution in original post


All Replies
Solution
‎04-19-2016 07:01 PM
PROC Star
Posts: 1,760

Re: predictive modelling / handling missing values

How about you use a value like 0 or -1 for the missing values you want to use?

Frequent Contributor
Posts: 84

Re: predictive modelling / handling missing values

[ Edited ]

Thank-you!  Yes, I will use a negative value (this website had told me that my submission of this question had been unsuccessful and so  - so I was surprised to receive this reponse here! ...)  

 

Edit: I should also note that for regression I made sure to include interaction terms for variables 4 and 5 and 4 and 6.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 253 views
  • 0 likes
  • 2 in conversation