BookmarkSubscribeRSS Feed
EC27556
Quartz | Level 8

I have datasets of above 1m - where the number of observations where the target variable is "true" ranges from 20% to 0.1%

 

When E Miner is constructing decision tree analysis, does it consider all 1m observations, or does it take a sample of the data when pruning?

 

I'm slightly concerned that if E-miner is sampling data before conducting pruning activities then there is a significant chance that any splits will be biased if say very few of the 0.1% target are selected - in many cases where the % is very small (often <1%) e miner cannot produce a tree - is it possibly because it is not randomly selecting any of the 0.1% for example?.

 

Linked to the above. Does anyone know what the optimal ratio of target 'hits' to 'non-hits' is with decision tree analysis? I.e. is around about 10% of your data having a hit for your target variable ok? I am considering of sampling my data before i conduct decision tree analysis so my data contains about 10% with the target variable true and 90% where it is not true.

1 REPLY 1
pink_poodle
Barite | Level 11
SAS Miner can split the data into training, testing and validation datasets. This partition can be user-defined: https://support.sas.com/documentation/onlinedoc/miner/casestudy_59123.pdf.
Sensitivity parameter shows how well the model identifies positive cases. If “hit” = true positive, and “miss” = false negative, then sensitivity = hits/(hits+misses). A 1:1 hit:miss ratio results in sensitivity of 0.5; 2:1 - sensitivity of 0.66. A sensitivity between 70 and 100% is considered good.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 487 views
  • 0 likes
  • 2 in conversation