BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TMiles
Quartz | Level 8

My output from the Train Dataset had missing values -what would be the cause?  See image belowCapture.JPG

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
MelodieRush
SAS Employee

RPM actually splits your data into a training and validation datasets.  It does a 50/50 split of the data and it will be a stratified sample using the target (dependent) variable to stratify.

 

You can sample before using RPM.  With only 2% response you may want to take all of those that responded and a sample of those who didn't.

 

For example, if I had a data set with 2% respondents and the dataset had 1000 rows, I would take all 20 respondents and maybe 200 non respondents. This would give me approximately 10% respondents and 90% nonrespondents.  I would suggest if possible to have your respondents represent at least 10-20% of the rows in your data mining dataset.  This should give you more stability in your model.

 

You can also use the decision processing within RPM to indicate the prior probabilities. Here's a paper that shows how to assign prior probabilities https://support.sas.com/resources/papers/proceedings10/113-2010.pdf.  Here's a tip that talks about doing so withing EM https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-overs...

Catch the SAS Global Forum keynotes, announcements, and tech content!
sasglobalforum.com | #SASGF



View solution in original post

9 REPLIES 9
BrettWujek
SAS Employee

100% of the observations in the Train data set that were Target=0 were predicted to be Target=0.  There were no false positives here - thus it is just represented as missing.

 

On the other hand, your false negative rate is really high...you should look into that.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

TMiles
Quartz | Level 8

Thank you for you reply,  I have done a proc means on the input and everything appears as I would expect.  I am using RPM - Intermediate in Enterprise Guide.   What should I be looking at?

BrettWujek
SAS Employee

It's not so much about the inputs in your data set here.  The model is just not good at accurately predicting positive responses.  Perhaps your data set is very imbalanced (is target=1 a rare event?).  In the "Decisions and priors" under the Model section in the RPM UI what are the data proportions for your target?

 


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

TMiles
Quartz | Level 8

Resp = 1 is 2% -which is pretty typical for a Direct Marketing Campaign.  I have the Prior Probabilities and the Decision Function both set to NONE -as I am not sure how to use them.

BrettWujek
SAS Employee

Ok - what I might suggest then is oversampling to get a more balanced data set for training (ie more observations with target=1 to learn from) and then set the priors according to the historical expectation (2% for level 1 in your case).  Hopefully this will train a model that can better predict the rare event.

 

Good luck.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

TMiles
Quartz | Level 8

I have a 100% Sample as my input.  It looks like RPM though is using a sample and I don't see a way to control that.

MelodieRush
SAS Employee

RPM actually splits your data into a training and validation datasets.  It does a 50/50 split of the data and it will be a stratified sample using the target (dependent) variable to stratify.

 

You can sample before using RPM.  With only 2% response you may want to take all of those that responded and a sample of those who didn't.

 

For example, if I had a data set with 2% respondents and the dataset had 1000 rows, I would take all 20 respondents and maybe 200 non respondents. This would give me approximately 10% respondents and 90% nonrespondents.  I would suggest if possible to have your respondents represent at least 10-20% of the rows in your data mining dataset.  This should give you more stability in your model.

 

You can also use the decision processing within RPM to indicate the prior probabilities. Here's a paper that shows how to assign prior probabilities https://support.sas.com/resources/papers/proceedings10/113-2010.pdf.  Here's a tip that talks about doing so withing EM https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-model-a-rare-target-using-an-overs...

Catch the SAS Global Forum keynotes, announcements, and tech content!
sasglobalforum.com | #SASGF



amitvermajhs
Calcite | Level 5

Hi,

If we want a similiar classification matrix target in SAS Eminer, what is the way of doing so.

Actually I am getting the matrix while running RPM(SAS EG to SAS Eminer) , but when modelling in SAS Eminer, I am unable to get the similiar matrix.

 

Regards

Amit Verma

 

MelodieRush
SAS Employee

In SAS Enterprise Miner you can get the same output as Rapid Predictive Modeler by using the Reporter Node under the Utility Tab.

 

Change the properties for the Reporter node to Style = Default and Nodes=Summary (like below).  This will give you a scorecard and the classification matrix as well as other output.

 

2016-11-03_16-35-28.jpg

 

 


2016-11-03_16-35-28.jpg

Catch the SAS Global Forum keynotes, announcements, and tech content!
sasglobalforum.com | #SASGF



sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1900 views
  • 0 likes
  • 4 in conversation