<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: EM Decision Trees - Stratification or Not - Validation or Test? in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178021#M2112</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As you have two datasets and want to use one for model development and other for validation. How about using user defined method within partition node?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Naeem&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 19 Nov 2014 15:26:37 GMT</pubDate>
    <dc:creator>stat_sas</dc:creator>
    <dc:date>2014-11-19T15:26:37Z</dc:date>
    <item>
      <title>EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178019#M2110</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN style="color: #1f497d; font-family: 'Calibri','sans-serif'; font-size: 11pt;"&gt;Let us assume I have n = 50,000 records in my training dataset. Then I also have 25,000 records in a unique &amp;amp; new dataset.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #1f497d; font-family: 'Calibri','sans-serif'; font-size: 11pt;"&gt;I would like to submit all of it to EM Decision Trees so that 100% of the data in my training dataset it used as my estimation – but the 25,000 serve precisely (100%) as my validation, or test, dataset.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #1f497d; font-family: 'Calibri','sans-serif'; font-size: 11pt;"&gt;In reviewing the literature it looks like I will have to do something via stratification – but then it needs the %s for each of the levels. So I am a little confused there.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #1f497d; font-family: 'Calibri','sans-serif'; font-size: 11pt;"&gt;Maybe this is not done with a Data Partition node? The following is another scenario that would be ideal:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #1f497d; font-family: 'Calibri','sans-serif'; font-size: 11pt;"&gt;Training = 80% of 50,000&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #1f497d; font-family: 'Calibri','sans-serif'; font-size: 11pt;"&gt;Validation = 20% of 50,000&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #1f497d; font-family: 'Calibri','sans-serif'; font-size: 11pt;"&gt;Test = 100% of 25,000&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #1f497d; font-family: 'Calibri','sans-serif'; font-size: 11pt;"&gt;How to make this happen perfectly? &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #1f497d; font-family: 'Calibri','sans-serif'; font-size: 11pt;"&gt;Thank you very much in advance,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="mso-no-proof: yes; color: #1f497d; font-size: 10pt; mso-fareast-theme-font: minor-fareast; mso-fareast-font-family: 'Times New Roman';"&gt;Zach Feinstein, Statistical Data Modeler&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="mso-no-proof: yes; color: #1f497d; font-size: 10pt; mso-fareast-theme-font: minor-fareast; mso-fareast-font-family: 'Times New Roman';"&gt;P&lt;/STRONG&gt;&lt;SPAN style="font-size: 10pt; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-no-proof: yes;"&gt; (952) 838-4289 &lt;STRONG&gt; &lt;SPAN style="color: #1f497d;"&gt;C&lt;/SPAN&gt;&lt;/STRONG&gt;(612) 590-4813&amp;nbsp; &lt;STRONG style="color: #1f497d;"&gt;F&lt;/STRONG&gt; (952) 838-2010&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG style="mso-no-proof: yes; color: #1f497d; font-size: 8pt; mso-fareast-theme-font: minor-fareast; mso-fareast-font-family: 'Times New Roman';"&gt;SFM Mutual Insurance Company&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 8pt; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-no-proof: yes;"&gt;3500 American Blvd. W,&lt;BR /&gt;Suite 700, Bloomington, MN 55431&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 8pt; mso-fareast-font-family: 'Times New Roman'; mso-fareast-theme-font: minor-fareast; mso-no-proof: yes;"&gt;&lt;A _jive_internal="true" href="/brandons.SFMIC/AppData/Local/Microsoft/Windows/Temporary Internet Files/PROJECTS_DNA/Email signature/www.sfmic.com"&gt;&lt;SPAN style="color: blue;"&gt;www.sfmic.com&lt;/SPAN&gt;&lt;/A&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 14:37:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178019#M2110</guid>
      <dc:creator>Zachary</dc:creator>
      <dc:date>2014-11-19T14:37:21Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178020#M2111</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Zach,&lt;/P&gt;&lt;P&gt;You would use the Data Partition node to get stratified samples (training, validation, or testing) from one data set.&lt;/P&gt;&lt;P&gt;In your example you want to use one data set twice, and another data set once. You can specify the role for your data set using the Role property.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;After you create your data source, and add it to your diagram, set the Role property to Train, Validate, or Test.&lt;/P&gt;&lt;P&gt;In the screenshot below I have set the same data source three times. I specified the role as train/validate/test for each data source node as an example similar to your question.&lt;/P&gt;&lt;P&gt;&lt;IMG alt="roles.png" class="jive-image-thumbnail jive-image" height="162" src="https://communities.sas.com/legacyfs/online/7949_roles.png" style="height: 162px; width: 699.592356687898px;" width="700" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is this what you needed?&lt;/P&gt;&lt;P&gt;Good luck,&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;Miguel&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 15:08:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178020#M2111</guid>
      <dc:creator>M_Maldonado</dc:creator>
      <dc:date>2014-11-19T15:08:50Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178021#M2112</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As you have two datasets and want to use one for model development and other for validation. How about using user defined method within partition node?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Naeem&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 15:26:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178021#M2112</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2014-11-19T15:26:37Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178022#M2113</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;That seems like a very reasonable way to do it. Thank you so much!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I think what I put together is the equivalent of what you did [picture below].&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Where may I find the output that compares or contrasts the scored nodes between the training and the test?&lt;/P&gt;&lt;P&gt;&lt;IMG alt="Capture.JPG" class="jive-image-thumbnail jive-image" src="https://communities.sas.com/legacyfs/online/7950_Capture.JPG" width="450" /&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 15:32:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178022#M2113</guid>
      <dc:creator>Zachary</dc:creator>
      <dc:date>2014-11-19T15:32:51Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178023#M2114</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;As long as your data partition node has test set to 0%, yep, I'd have done it the exact same way.&lt;/P&gt;&lt;P&gt;The results of your model node (e.g. decision tree) have fit statistics for all your partitions. For more stats like ROC, lift, gain, response, add a Model comparison node and see the results.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope it helps,&lt;/P&gt;&lt;P&gt;M&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 15:44:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178023#M2114</guid>
      <dc:creator>M_Maldonado</dc:creator>
      <dc:date>2014-11-19T15:44:54Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178024#M2115</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;That, again, is some great help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I suppose beggars cannot be choosers, but is it possible to see the same kind or quality of tree output - instead of &lt;SPAN style="text-decoration: underline;"&gt;Training versus Validation&lt;/SPAN&gt; it will display &lt;SPAN style="text-decoration: underline;"&gt;Training versus Test&lt;/SPAN&gt; for the statistically significant nodes from before?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 15:58:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178024#M2115</guid>
      <dc:creator>Zachary</dc:creator>
      <dc:date>2014-11-19T15:58:05Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178025#M2116</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;man, with EM you can always choose... or come up with a workaround.&lt;/P&gt;&lt;P&gt;what do you have in mind? just the tree plot with stats for train &amp;amp; test on the boxes? or something else?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 16:43:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178025#M2116</guid>
      <dc:creator>M_Maldonado</dc:creator>
      <dc:date>2014-11-19T16:43:49Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178026#M2117</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Basic tree plot with the stats for train &amp;amp; test within the two columns of boxes would be ideal.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But I think the only difficult part would be to ensure that the nodes are precisely the same as what was generated by default, or interactively, within the initial Training runs.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 16:55:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178026#M2117</guid>
      <dc:creator>Zachary</dc:creator>
      <dc:date>2014-11-19T16:55:24Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178027#M2118</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have a couple ideas, will be in touch later today.&lt;/P&gt;&lt;P&gt;what EM version do you have?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 17:01:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178027#M2118</guid>
      <dc:creator>M_Maldonado</dc:creator>
      <dc:date>2014-11-19T17:01:27Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178028#M2119</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Zachary.&amp;nbsp; In the words of Spiderman, my spider sense is tingling.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope there are no differences in data set A (n=50,000) and data set B (n=25,000).&amp;nbsp; I've been in this situation before and was told data set B was collected the same as data set A, but I find out later there was a slight methodological change in collected for B.&amp;nbsp; Yes, the variables in both data sets are the same, but the underlying values had different assumptions.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You wouldn't want to validate a model using nuanced data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Maybe there is a way to do both approaches and compare them.&amp;nbsp; If you had a third data set for scoring, you could compare the results of the models.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 17:06:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178028#M2119</guid>
      <dc:creator>jaredp</dc:creator>
      <dc:date>2014-11-19T17:06:03Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178029#M2120</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You raise an excellent point. Thank you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Actually - both datasets come from precisely the same pool. So perhaps that will aid in the discussion, configuration, and methodology behind seeing how the Training lines up with the Test within a Decision Tree.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Do you have any suggestions in how to best compare the results after the scoring? I have a full breadth of experience with the Training data and the Validation - just not with the Test data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I almost wish there was a way for me to use the node of Data Partition where 80% of the first dataset is used for Training, 20% of that dataset is used for Validation, then 100% of the "other data" becomes the test. But the trick would be to have the Training/Validation/Test al in one dataset.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 18:46:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178029#M2120</guid>
      <dc:creator>Zachary</dc:creator>
      <dc:date>2014-11-19T18:46:57Z</dc:date>
    </item>
    <item>
      <title>Re: EM Decision Trees - Stratification or Not - Validation or Test?</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178030#M2121</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks a bunch. EM 6.1.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Nov 2014 18:47:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/EM-Decision-Trees-Stratification-or-Not-Validation-or-Test/m-p/178030#M2121</guid>
      <dc:creator>Zachary</dc:creator>
      <dc:date>2014-11-19T18:47:43Z</dc:date>
    </item>
  </channel>
</rss>

