<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to handle large datasets (up to 3 Mio observations) for all models in SAS EM in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-handle-large-datasets-up-to-3-Mio-observations-for-all/m-p/432178#M6626</link>
    <description>&lt;P&gt;Hello YG1992 -&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A first step is to examine the text of the error to find more specific information about the problem.&amp;nbsp; Based on the text of the error, try some searches on this page:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/notes/" target="_self"&gt;http://support.sas.com/notes/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Example: if the errors are out-of-memory errors, then try notes such as this one.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/kb/61/376.html" target="_blank"&gt;61376 - Overcoming "insufficient memory ." and "parameter larger than documented limit" error messages&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If none of that information leads you to a resolution, then &lt;A href="http://support.sas.com/kb/58/396.html" target="_self"&gt;turn on the MPRINT option&lt;/A&gt;, &lt;A href="http://support.sas.com/kb/46/764.html" target="_self"&gt;create a model package&lt;/A&gt;, and &lt;A href="https://support.sas.com/en/technical-support.html" target="_self"&gt;contact technical support&lt;/A&gt; for assistance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Have a great day.&lt;/P&gt;</description>
    <pubDate>Tue, 30 Jan 2018 13:41:04 GMT</pubDate>
    <dc:creator>MikeStockstill</dc:creator>
    <dc:date>2018-01-30T13:41:04Z</dc:date>
    <item>
      <title>How to handle large datasets (up to 3 Mio observations) for all models in SAS EM</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-handle-large-datasets-up-to-3-Mio-observations-for-all/m-p/431830#M6614</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my project I have to build different&amp;nbsp;models for &lt;FONT color="#ff0000"&gt;large datasets&lt;/FONT&gt; and some of them may have &lt;FONT color="#ff0000"&gt;more than 3 million observations&lt;/FONT&gt; and hundreds of input variables. For logistic regression (LR) and decision trees (DT) the correspondent nodes work fine; but for some machine learning methods such as SVM, Random Forest, Gradient Boosting, k-Nearest-Neighbors and so on they sometimes fail to complete running with some &lt;FONT color="#ff0000"&gt;error messages&lt;/FONT&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If I sample &lt;FONT color="#ff0000"&gt;a small subsample&lt;/FONT&gt; and apply those methods with exactly the same hyper-parameter settings then &lt;FONT color="#ff0000"&gt;everything is fine&lt;/FONT&gt;. That's why I conclude that&lt;FONT color="#ff0000"&gt;&amp;nbsp;those errors are related with sample size&lt;/FONT&gt;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In conclusion, I wonder if there exist a way to allow me to use all the training data (e.g. 3 million x 0.7 = 2.1 million training observations) to build SVM, RF, GBDT, kNN and so on. I think that "Group" nodes may be helpful to do something like "batching" the data, but I am not sure and not clear how it will be like specifically.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If you have any suggestions you are welcome to discuss them with me and I would really appreciate it.&lt;/P&gt;&lt;P&gt;Thanks very much!&lt;/P&gt;</description>
      <pubDate>Mon, 29 Jan 2018 16:08:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-handle-large-datasets-up-to-3-Mio-observations-for-all/m-p/431830#M6614</guid>
      <dc:creator>YG1992</dc:creator>
      <dc:date>2018-01-29T16:08:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle large datasets (up to 3 Mio observations) for all models in SAS EM</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/How-to-handle-large-datasets-up-to-3-Mio-observations-for-all/m-p/432178#M6626</link>
      <description>&lt;P&gt;Hello YG1992 -&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A first step is to examine the text of the error to find more specific information about the problem.&amp;nbsp; Based on the text of the error, try some searches on this page:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/notes/" target="_self"&gt;http://support.sas.com/notes/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Example: if the errors are out-of-memory errors, then try notes such as this one.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/kb/61/376.html" target="_blank"&gt;61376 - Overcoming "insufficient memory ." and "parameter larger than documented limit" error messages&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If none of that information leads you to a resolution, then &lt;A href="http://support.sas.com/kb/58/396.html" target="_self"&gt;turn on the MPRINT option&lt;/A&gt;, &lt;A href="http://support.sas.com/kb/46/764.html" target="_self"&gt;create a model package&lt;/A&gt;, and &lt;A href="https://support.sas.com/en/technical-support.html" target="_self"&gt;contact technical support&lt;/A&gt; for assistance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Have a great day.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Jan 2018 13:41:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/How-to-handle-large-datasets-up-to-3-Mio-observations-for-all/m-p/432178#M6626</guid>
      <dc:creator>MikeStockstill</dc:creator>
      <dc:date>2018-01-30T13:41:04Z</dc:date>
    </item>
  </channel>
</rss>

