<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic what is the optimal way to use variable selection node in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/what-is-the-optimal-way-to-use-variable-selection-node/m-p/181653#M2176</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have about 120 different attributes on my modelling data. So i would like to make a reduction on my variable set. &lt;/P&gt;&lt;P&gt;But i m not sure if am i have to employ two different variable selection node to my flow; one bypassing class variables and one bypassing interval variables and then connecting both of them to my modelling node.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I do so but i am receiving very low quantity of variables for my final modelling.... &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there a problem on my way of using variable selection node&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 03 Jun 2014 01:53:24 GMT</pubDate>
    <dc:creator>omerzeybek</dc:creator>
    <dc:date>2014-06-03T01:53:24Z</dc:date>
    <item>
      <title>what is the optimal way to use variable selection node</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/what-is-the-optimal-way-to-use-variable-selection-node/m-p/181653#M2176</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have about 120 different attributes on my modelling data. So i would like to make a reduction on my variable set. &lt;/P&gt;&lt;P&gt;But i m not sure if am i have to employ two different variable selection node to my flow; one bypassing class variables and one bypassing interval variables and then connecting both of them to my modelling node.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I do so but i am receiving very low quantity of variables for my final modelling.... &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there a problem on my way of using variable selection node&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 03 Jun 2014 01:53:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/what-is-the-optimal-way-to-use-variable-selection-node/m-p/181653#M2176</guid>
      <dc:creator>omerzeybek</dc:creator>
      <dc:date>2014-06-03T01:53:24Z</dc:date>
    </item>
    <item>
      <title>Re: what is the optimal way to use variable selection node</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/what-is-the-optimal-way-to-use-variable-selection-node/m-p/181654#M2177</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Not sure if the metadata of each of those nodes would get passed to your modeling node the way you intended. A quick way to check: click the Variables ellipsis for the model node and confirm that the role for your variables are what you expected.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Reading at the doc, it does not seem to me that you would need to pass a few variables at a time to get more variables selected. Variable Selection is doing distribution analysis and running a step-wise regression to keep the most important variables. You are good to pass all variables at once.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you would like to try other methods for variable selection, simply connect any of the below before your modeling node. I am pretty sure all of them have variable selection option set to Yes by default.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Tree, or tree ensemble nodes have variable selection turned on by default. Try a Decision Tree node, HPTree node, Gradient Boosting, or HPForest&lt;/LI&gt;&lt;LI&gt;Partial Least Square and Survival nodes have variable selection options&lt;/LI&gt;&lt;LI&gt;Interaction terms. Variable Selection and Regression node have options to test interactions. Set Use Interactions to Yes on the VS node. Set Two-Factor Interactions set to Yes in the Regression node.&lt;/LI&gt;&lt;LI&gt;Gini or Information Value variable importance from the Interactive Grouping Node (licensed with Credit Scoring for SAS Enterprise Miner).&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Not sure what technique will work best for you. I guess it depends on the data. I use mostly Information Value or tree-based variable importance, but that is just my preference.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I hope it helps!&lt;/P&gt;&lt;P&gt;Miguel&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 03 Jun 2014 03:15:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/what-is-the-optimal-way-to-use-variable-selection-node/m-p/181654#M2177</guid>
      <dc:creator>M_Maldonado</dc:creator>
      <dc:date>2014-06-03T03:15:35Z</dc:date>
    </item>
    <item>
      <title>Re: what is the optimal way to use variable selection node</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/what-is-the-optimal-way-to-use-variable-selection-node/m-p/181655#M2178</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;also check proc corr&amp;nbsp; Cronbach's coefficient alpha&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Xia Keshan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 03 Jun 2014 12:53:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/what-is-the-optimal-way-to-use-variable-selection-node/m-p/181655#M2178</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2014-06-03T12:53:00Z</dc:date>
    </item>
    <item>
      <title>Re: what is the optimal way to use variable selection node</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/what-is-the-optimal-way-to-use-variable-selection-node/m-p/379475#M5646</link>
      <description>&lt;P&gt;There are many ways to identify important variables including multiple options in the Variable Selection node depending on the measurement level of your target variable. &amp;nbsp; If the variables that have been identified are not performing well, there could be many possible reasons contributing to the problem such as...&lt;/P&gt;
&lt;P&gt;... limited information in the predictor variables&lt;/P&gt;
&lt;P&gt;... poorly conditioned input variables (perhaps a transformation of the variables would perform better)&lt;/P&gt;
&lt;P&gt;... mismatch between the selection method and the modeling method (e.g. it does not necessarily make sense to use a regression based linear variable selection technique when passing variables to a non-linear modeling algorithm like a Tree or Neural Network)&lt;/P&gt;
&lt;P&gt;... lack of sufficient target signal (e.g. if you are modeling a rare event, it is possible that variables are being missed due to the criteria you are using for selecting them in which case oversampling and/or considering decision weights/priors might be of help)&lt;/P&gt;
&lt;P&gt;... lack of model flexibility (e.g. using a regression without considering the possibility of higher order terms/interactions and/or considering more flexible modeling strategies)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In general, I strongly advocate using several different variable selection strategies including using multiple Variable Selection nodes with different settings and Decision Tree nodes to create a superset of possibly useful input variables. &amp;nbsp;Depending on the model, further selection might be possible. &amp;nbsp;Note that Decision Trees automatically select variables, Regression approaches optionally can use selection methods, and Random Forest models build Trees from subsets of variables as well as subsets of observations. &amp;nbsp; Making sure you have not overly restricted the input variables but have considered possibly helpful binning and/or numeric transformations and are using sufficiently flexible modeling methods should help you to obtain the best possible predictions based on your data. &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2017 17:18:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/what-is-the-optimal-way-to-use-variable-selection-node/m-p/379475#M5646</guid>
      <dc:creator>DougWielenga</dc:creator>
      <dc:date>2017-07-26T17:18:22Z</dc:date>
    </item>
  </channel>
</rss>

