turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Classification Matrix Target

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2016 11:35 AM

My output from the Train Dataset had missing values -what would be the cause? See image below

Accepted Solutions

Solution

10-28-2016
07:48 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2016 03:56 PM

RPM actually splits your data into a training and validation datasets. It does a 50/50 split of the data and it will be a stratified sample using the target (dependent) variable to stratify.

You can sample before using RPM. With only 2% response you may want to take all of those that responded and a sample of those who didn't.

For example, if I had a data set with 2% respondents and the dataset had 1000 rows, I would take all 20 respondents and maybe 200 non respondents. This would give me approximately 10% respondents and 90% nonrespondents. I would suggest if possible to have your respondents represent at least 10-20% of the rows in your data mining dataset. This should give you more stability in your model.

You can also use the decision processing within RPM to indicate the prior probabilities. Here's a paper that shows how to assign prior probabilities https://support.sas.com/resources/papers/proceedings10/113-2010.pdf. Here's a tip that talks about doing so withing EM https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-overs...

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2016 01:00 PM - edited 10-28-2016 01:03 PM

100% of the observations in the Train data set that were Target=0 were predicted to be Target=0. There were no false positives here - thus it is just represented as missing.

On the other hand, your false negative rate is really high...you should look into that.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2016 01:11 PM

Thank you for you reply, I have done a proc means on the input and everything appears as I would expect. I am using RPM - Intermediate in Enterprise Guide. What should I be looking at?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2016 01:41 PM

It's not so much about the inputs in your data set here. The model is just not good at accurately predicting positive responses. Perhaps your data set is very imbalanced (is target=1 a rare event?). In the "Decisions and priors" under the Model section in the RPM UI what are the data proportions for your target?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2016 01:53 PM

Resp = 1 is 2% -which is pretty typical for a Direct Marketing Campaign. I have the Prior Probabilities and the Decision Function both set to NONE -as I am not sure how to use them.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2016 02:47 PM

Ok - what I might suggest then is oversampling to get a more balanced data set for training (ie more observations with target=1 to learn from) and then set the priors according to the historical expectation (2% for level 1 in your case). Hopefully this will train a model that can better predict the rare event.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2016 02:48 PM

I have a 100% Sample as my input. It looks like RPM though is using a sample and I don't see a way to control that.

Solution

10-28-2016
07:48 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-28-2016 03:56 PM

RPM actually splits your data into a training and validation datasets. It does a 50/50 split of the data and it will be a stratified sample using the target (dependent) variable to stratify.

You can sample before using RPM. With only 2% response you may want to take all of those that responded and a sample of those who didn't.

For example, if I had a data set with 2% respondents and the dataset had 1000 rows, I would take all 20 respondents and maybe 200 non respondents. This would give me approximately 10% respondents and 90% nonrespondents. I would suggest if possible to have your respondents represent at least 10-20% of the rows in your data mining dataset. This should give you more stability in your model.

You can also use the decision processing within RPM to indicate the prior probabilities. Here's a paper that shows how to assign prior probabilities https://support.sas.com/resources/papers/proceedings10/113-2010.pdf. Here's a tip that talks about doing so withing EM https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-overs...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-03-2016 03:36 AM

Hi,

If we want a similiar classification matrix target in SAS Eminer, what is the way of doing so.

Actually I am getting the matrix while running RPM(SAS EG to SAS Eminer) , but when modelling in SAS Eminer, I am unable to get the similiar matrix.

Regards

Amit Verma

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-03-2016 04:37 PM

In SAS Enterprise Miner you can get the same output as Rapid Predictive Modeler by using the Reporter Node under the Utility Tab.

Change the properties for the Reporter node to Style = Default and Nodes=Summary (like below). This will give you a scorecard and the classification matrix as well as other output.