<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Variable reduction in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166747#M1840</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; I am pretty new to modeling, I am struck with over 900 Interval input variables. I need some ideas to reduce them. Is there any way to find correlation between these variables so that redundancy can be handled? I am using Miner and Guide.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 12 Feb 2015 21:09:16 GMT</pubDate>
    <dc:creator>MinalMMurkhande</dc:creator>
    <dc:date>2015-02-12T21:09:16Z</dc:date>
    <item>
      <title>Variable reduction</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166747#M1840</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; I am pretty new to modeling, I am struck with over 900 Interval input variables. I need some ideas to reduce them. Is there any way to find correlation between these variables so that redundancy can be handled? I am using Miner and Guide.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 12 Feb 2015 21:09:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166747#M1840</guid>
      <dc:creator>MinalMMurkhande</dc:creator>
      <dc:date>2015-02-12T21:09:16Z</dc:date>
    </item>
    <item>
      <title>Re: Variable reduction</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166748#M1841</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hey Minal,&lt;/P&gt;&lt;P&gt;Enterprise Miner is really easy to learn. Use the reference help a lot (press F1 on your keyboard) and google for white papers on the most common Analytics problems you are trying to solve.&lt;/P&gt;&lt;P&gt;Take a look at this thread where we list a good number of ways to do variable selection: &lt;A __default_attr="58368" __jive_macro_name="thread" class="jive_macro jive_macro_thread" href="https://communities.sas.com/" modifiedtitle="true" title="what is the optimal way to use variable selection node"&gt;&lt;/A&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck!&lt;/P&gt;&lt;P&gt;Miguel&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 13 Feb 2015 15:14:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166748#M1841</guid>
      <dc:creator>M_Maldonado</dc:creator>
      <dc:date>2015-02-13T15:14:23Z</dc:date>
    </item>
    <item>
      <title>Re: Variable reduction</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166749#M1842</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px; background-color: #ffffff;"&gt;Miguel,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Thank you for your reply! I had been waiting on someone to answer. But, given a set of census data(400 interval variable)&amp;nbsp; and financial data(another 400 interval variable) how do I find the correlation within these variables ? Will correlation as the first screening help ? or should I directly start on with decision tree / GBM models?&lt;/P&gt;&lt;P&gt;I can calculate spearman and hoeffding coefficients and also the VIF factor, but all that comes later once I run the model. How do I start of with initial screening? It would be very good if I could screen them using pearson correlation statistic... , but it would give me a matrix with 400 rows and 400 columns &lt;img id="smileysad" class="emoticon emoticon-smileysad" src="https://communities.sas.com/i/smilies/16x16_smiley-sad.png" alt="Smiley Sad" title="Smiley Sad" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, I did Variable clustering. I select one variable which has the least 1-R^2 in each cluster. Would that work either?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't now if I am thinking in the right way. Any help would be appreciated.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Minal&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 15 Feb 2015 16:22:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166749#M1842</guid>
      <dc:creator>MinalMMurkhande</dc:creator>
      <dc:date>2015-02-15T16:22:04Z</dc:date>
    </item>
    <item>
      <title>Re: Variable reduction</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166750#M1843</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;Hi,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;You didn't mention the purpose of your stu&lt;/SPAN&gt;dy. Prediction? What is your target variable?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;If census data are predictors, and financial variables are the target, then try PLS.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;Yes, Variable Clustering is a good tool for explanatory analysis or for dimensionality reduction.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;You can also do a PCA or variable selection (node). Or you can use some of the modeling nodes (tree, forest, regression, LAR/LASSO, PLS, etc.) to select useful variables.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt; line-height: 1.5em;"&gt;Gergely&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt; line-height: 1.5em;"&gt;Message was edited by: Gergely Bathó&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 16 Feb 2015 00:23:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166750#M1843</guid>
      <dc:creator>gergely_batho</dc:creator>
      <dc:date>2015-02-16T00:23:29Z</dc:date>
    </item>
    <item>
      <title>Re: Variable reduction</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166751#M1844</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; I have a target variable which is binary. Both census data and finance data are predictors. Each have close to 400 variables so the total number of variables that I need to reduce is 800. Variable Clustering did help. But I was looking for more accurate solutions like correlation etc. Is there no way in which I can find the correlation between these variables? Also, would finding correlation for so many variables work?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Minal&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 16 Feb 2015 02:23:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166751#M1844</guid>
      <dc:creator>MinalMMurkhande</dc:creator>
      <dc:date>2015-02-16T02:23:49Z</dc:date>
    </item>
    <item>
      <title>Re: Variable reduction</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166752#M1845</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;Hi Minal,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;SAS is able to calculate the correlation matrix of those 800 variable. But as you already noted, it is quite hard to look at 800x800/2 coefficients manually.&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;One way to handle it to calculate the first K principal components (PCA Node in Enterprise Miner), and use them in a predictive model. PCA&lt;STRONG&gt; is&lt;/STRONG&gt; based on the correlation matrix. Instead of the original variables you will have K factor scores.&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;PCA factors and factor scores are hard to interpret, because each factor is a mixture (linear combination) of all variables. With variable clustering you also get factors but each of them depends only on some of the variables. Variable clustering&lt;STRONG&gt; is also&lt;/STRONG&gt; based on correlations. It iteratively calculates PCA on the original variables (and on linear combinations of variables).&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;You can keep 1 variable from each cluster (as you described), or you can keep a linear combination of the variables in the cluster. The former is more interpretable, the latter is more “precise” in some sense.&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="margin-bottom: .0001pt;"&gt;&lt;SPAN lang="EN-US" style="font-size: 10pt; font-family: Arial, sans-serif;"&gt;Gergely&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 16 Feb 2015 12:48:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166752#M1845</guid>
      <dc:creator>gergely_batho</dc:creator>
      <dc:date>2015-02-16T12:48:02Z</dc:date>
    </item>
    <item>
      <title>Re: Variable reduction</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166753#M1846</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;BR /&gt;Thank you Gergely !&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 16 Feb 2015 14:38:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166753#M1846</guid>
      <dc:creator>MinalMMurkhande</dc:creator>
      <dc:date>2015-02-16T14:38:39Z</dc:date>
    </item>
    <item>
      <title>Re: Variable reduction</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166754#M1847</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;@Gergely, thanks, that is some solid advice!&lt;/P&gt;&lt;P&gt;@Minal, if you are interested on calculating the VIF, here is one way to approach it: &lt;A __default_attr="5842" __jive_macro_name="document" class="jive_macro jive_macro_document" href="https://communities.sas.com/"&gt;&lt;/A&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Miguel&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 16 Feb 2015 14:44:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-reduction/m-p/166754#M1847</guid>
      <dc:creator>M_Maldonado</dc:creator>
      <dc:date>2015-02-16T14:44:15Z</dc:date>
    </item>
  </channel>
</rss>

