BookmarkSubscribeRSS Feed
EC27556
Quartz | Level 8

I have datasets of above 1m - where the number of observations where the target variable is "true" ranges from 20% to 0.1%

 

When E Miner is constructing decision tree analysis, does it consider all 1m observations, or does it take a sample of the data when pruning?

 

I'm slightly concerned that if E-miner is sampling data before conducting pruning activities then there is a significant chance that any splits will be biased if say very few of the 0.1% target are selected - in many cases where the % is very small (often <1%) e miner cannot produce a tree - is it possibly because it is not randomly selecting any of the 0.1% for example?.

 

Linked to the above. Does anyone know what the optimal ratio of target 'hits' to 'non-hits' is with decision tree analysis? I.e. is around about 10% of your data having a hit for your target variable ok? I am considering of sampling my data before i conduct decision tree analysis so my data contains about 10% with the target variable true and 90% where it is not true.

1 REPLY 1
pink_poodle
Barite | Level 11
SAS Miner can split the data into training, testing and validation datasets. This partition can be user-defined: https://support.sas.com/documentation/onlinedoc/miner/casestudy_59123.pdf.
Sensitivity parameter shows how well the model identifies positive cases. If “hit” = true positive, and “miss” = false negative, then sensitivity = hits/(hits+misses). A 1:1 hit:miss ratio results in sensitivity of 0.5; 2:1 - sensitivity of 0.66. A sensitivity between 70 and 100% is considered good.

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 760 views
  • 0 likes
  • 2 in conversation