turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- predictive modelling / handling missing values

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-19-2016 03:15 AM

This question is an overlap of methodology and SAS programming - hopefully it fits here ...

I wish to build a predictive model with explanatory variables that have different types of missing values.

e.g. (this is made up)

Response Variable: Primary policy holder of a current insurance policy purchases an additional insurance benefit (add-on)

Explanatory Variable 1: Number of customers on policy (no missing values)

Explanatory Variable 2: How the current policy purchased (sales channel) (Missing values = unknown)

Explanatory Variable 3: Country of origin (can include missing values. Missing values = unknown)

Explanatory Variable 4: How many claims has the customer made (0+) (no missing values)

Explanatory Variable 5: Maximum settlement time of claims made (if missing, this is becaue no claims were made)

Explanatory Variable 6: Maximum claim amount (if missing, this is because no claims were made)

Is there a way to distinguish the "missing value" in explanatory variables 5 and 6 (because it is not applicable) as distinct to the missing values in explanatory variables 2 and 3?

Effectively, I want to consider explanatory variable 5 missing category as a category but those in variables 2 and 3 as missing.

My first step was to use hpsplit to gauge what interactions to include in a logistic regression model (as per paper: Methods for Interaction Detection in Predictive Modeling Using SAS ) using hpsplit. I see that SAS has special missing values (.a - .z) however it doesn't seem that hpsplit treats them differently. It seems there is a blanket treatment for all missing values via assignmissing=BRANCH|NONE|POPULAR|SIMILAR (from SAS/STAT® 14.1 User’s Guide The HPSPLIT Procedure)

Any suggestions would be greatly appreciated on how to handle such missing values / interrelated variables.

Thanks

Accepted Solutions

Solution

04-19-2016
07:01 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mduarte

04-19-2016 07:07 AM

How about you use a value like 0 or -1 for the missing values you want to use?

All Replies

Solution

04-19-2016
07:01 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mduarte

04-19-2016 07:07 AM

How about you use a value like 0 or -1 for the missing values you want to use?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ChrisNZ

04-19-2016 07:01 PM - edited 04-28-2016 08:55 AM

Thank-you! Yes, I will use a negative value (this website had told me that my submission of this question had been unsuccessful and so - so I was surprised to receive this reponse here! ...)

**Edit**: I should also note that for regression I made sure to include interaction terms for variables 4 and 5 and 4 and 6.