<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Does E Miner take a sample of data when constructing decision trees with large datasets? in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Does-E-Miner-take-a-sample-of-data-when-constructing-decision/m-p/790672#M9033</link>
    <description>&lt;P&gt;I have datasets of above 1m - where the number of observations where the target variable is "true" ranges from 20% to 0.1%&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When E Miner is constructing decision tree analysis, does it consider all 1m observations, or does it take a sample of the data when pruning?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm slightly concerned that if E-miner is sampling data before conducting pruning activities then there is a significant chance that any splits will be biased if say very few of the 0.1% target are selected - in many cases where the % is very small (often &amp;lt;1%) e miner cannot produce a tree - is it possibly because it is not randomly selecting any of the 0.1% for example?.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Linked to the above. Does anyone know what the optimal ratio of target 'hits' to 'non-hits' is with decision tree analysis? I.e. is around about 10% of your data having a hit for your target variable ok? I am considering of sampling my data before i conduct decision tree analysis so my data contains about 10% with the target variable true and 90% where it is not true.&lt;/P&gt;</description>
    <pubDate>Tue, 18 Jan 2022 12:07:47 GMT</pubDate>
    <dc:creator>EC27556</dc:creator>
    <dc:date>2022-01-18T12:07:47Z</dc:date>
    <item>
      <title>Does E Miner take a sample of data when constructing decision trees with large datasets?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Does-E-Miner-take-a-sample-of-data-when-constructing-decision/m-p/790672#M9033</link>
      <description>&lt;P&gt;I have datasets of above 1m - where the number of observations where the target variable is "true" ranges from 20% to 0.1%&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When E Miner is constructing decision tree analysis, does it consider all 1m observations, or does it take a sample of the data when pruning?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm slightly concerned that if E-miner is sampling data before conducting pruning activities then there is a significant chance that any splits will be biased if say very few of the 0.1% target are selected - in many cases where the % is very small (often &amp;lt;1%) e miner cannot produce a tree - is it possibly because it is not randomly selecting any of the 0.1% for example?.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Linked to the above. Does anyone know what the optimal ratio of target 'hits' to 'non-hits' is with decision tree analysis? I.e. is around about 10% of your data having a hit for your target variable ok? I am considering of sampling my data before i conduct decision tree analysis so my data contains about 10% with the target variable true and 90% where it is not true.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Jan 2022 12:07:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Does-E-Miner-take-a-sample-of-data-when-constructing-decision/m-p/790672#M9033</guid>
      <dc:creator>EC27556</dc:creator>
      <dc:date>2022-01-18T12:07:47Z</dc:date>
    </item>
    <item>
      <title>Re: Does E Miner take a sample of data when constructing decision trees with large datasets?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Does-E-Miner-take-a-sample-of-data-when-constructing-decision/m-p/794645#M9063</link>
      <description>SAS Miner can split the data into training, testing and validation datasets. This partition can be user-defined: &lt;A href="https://support.sas.com/documentation/onlinedoc/miner/casestudy_59123.pdf" target="_blank"&gt;https://support.sas.com/documentation/onlinedoc/miner/casestudy_59123.pdf&lt;/A&gt;.&lt;BR /&gt;Sensitivity parameter shows how well the model identifies positive cases. If “hit” = true positive, and “miss” = false negative, then sensitivity = hits/(hits+misses). A 1:1 hit:miss ratio results in sensitivity of 0.5; 2:1 - sensitivity of 0.66. A sensitivity between 70 and 100% is considered good.</description>
      <pubDate>Sat, 05 Feb 2022 03:22:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Does-E-Miner-take-a-sample-of-data-when-constructing-decision/m-p/794645#M9063</guid>
      <dc:creator>pink_poodle</dc:creator>
      <dc:date>2022-02-05T03:22:00Z</dc:date>
    </item>
  </channel>
</rss>

