<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Feature engineering with autopilot actions, and samplig (surveyselect vs partition procedures) in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Feature-engineering-with-autopilot-actions-and-samplig/m-p/800673#M9104</link>
    <description>&lt;P&gt;Hi, community,&lt;/P&gt;&lt;P&gt;I want to explain a couple of situations where Viya (3.5) behaves differently from SAS 9.4.&lt;/P&gt;&lt;P&gt;The cases:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;dataSciencePilot.featureMachine&lt;/LI&gt;&lt;LI&gt;Proc partition vs. Proc surveyselect&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;STRONG&gt;In the first case&lt;/STRONG&gt;, I encountered a strange behavior with a simple dataset of 15k obs and around 120 features.&lt;/P&gt;&lt;P&gt;I added some date vars to the specific input list on a machine with 512GB ram and 80 cores on the first run.&lt;/P&gt;&lt;P&gt;To my surprise, the actionset has used all the available ram and the swap, causing the cas process to be killed by OS (Redhat). After that, I realized that the date vars were not helpful for the model I was going to build, so these vars have been dropped.&lt;/P&gt;&lt;P&gt;With that change, the process took 30 seconds to complete, so I assume that the distribution of the dates determines some issues. My question is: why Viya hasn't provided any warning in the log?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Second case&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;While sampling a dataset needed for the TSNE analysis, I first used the surveyselect procedure for a stratified sampling approach. By mistake, I added the logical key of the dataset to the BY group.&lt;/P&gt;&lt;P&gt;The procedure log reported (correctly):&lt;/P&gt;&lt;PRE class=""&gt;ERROR: The number of strata, 14551, is greater than the total sample size, 1456.&lt;/PRE&gt;&lt;P&gt;That's fine! I recognized my mistake, and once corrected, I got the results I needed.&lt;/P&gt;&lt;P&gt;Then I tried the same (erroneous) approach with the partition procedure, obtaining the same result I encountered with the dataSciencePilot.featureMachine: The actionSet has consumed both the RAM and the SWAP filesystem without warnings in the log.&lt;/P&gt;&lt;P&gt;Could you explain this behavior?&lt;/P&gt;&lt;P&gt;I appreciate any help you can provide.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 07 Mar 2022 15:22:13 GMT</pubDate>
    <dc:creator>andrea_magatti</dc:creator>
    <dc:date>2022-03-07T15:22:13Z</dc:date>
    <item>
      <title>Feature engineering with autopilot actions, and samplig (surveyselect vs partition procedures)</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Feature-engineering-with-autopilot-actions-and-samplig/m-p/800673#M9104</link>
      <description>&lt;P&gt;Hi, community,&lt;/P&gt;&lt;P&gt;I want to explain a couple of situations where Viya (3.5) behaves differently from SAS 9.4.&lt;/P&gt;&lt;P&gt;The cases:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;dataSciencePilot.featureMachine&lt;/LI&gt;&lt;LI&gt;Proc partition vs. Proc surveyselect&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;STRONG&gt;In the first case&lt;/STRONG&gt;, I encountered a strange behavior with a simple dataset of 15k obs and around 120 features.&lt;/P&gt;&lt;P&gt;I added some date vars to the specific input list on a machine with 512GB ram and 80 cores on the first run.&lt;/P&gt;&lt;P&gt;To my surprise, the actionset has used all the available ram and the swap, causing the cas process to be killed by OS (Redhat). After that, I realized that the date vars were not helpful for the model I was going to build, so these vars have been dropped.&lt;/P&gt;&lt;P&gt;With that change, the process took 30 seconds to complete, so I assume that the distribution of the dates determines some issues. My question is: why Viya hasn't provided any warning in the log?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Second case&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;While sampling a dataset needed for the TSNE analysis, I first used the surveyselect procedure for a stratified sampling approach. By mistake, I added the logical key of the dataset to the BY group.&lt;/P&gt;&lt;P&gt;The procedure log reported (correctly):&lt;/P&gt;&lt;PRE class=""&gt;ERROR: The number of strata, 14551, is greater than the total sample size, 1456.&lt;/PRE&gt;&lt;P&gt;That's fine! I recognized my mistake, and once corrected, I got the results I needed.&lt;/P&gt;&lt;P&gt;Then I tried the same (erroneous) approach with the partition procedure, obtaining the same result I encountered with the dataSciencePilot.featureMachine: The actionSet has consumed both the RAM and the SWAP filesystem without warnings in the log.&lt;/P&gt;&lt;P&gt;Could you explain this behavior?&lt;/P&gt;&lt;P&gt;I appreciate any help you can provide.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2022 15:22:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Feature-engineering-with-autopilot-actions-and-samplig/m-p/800673#M9104</guid>
      <dc:creator>andrea_magatti</dc:creator>
      <dc:date>2022-03-07T15:22:13Z</dc:date>
    </item>
  </channel>
</rss>

