<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: find the best variables to use and best segmentation in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148095#M39163</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I was hoping to find a procedure that finds the best variables that are the most significant to the dependent varaible.&amp;nbsp; If for example I have 20 varaibles and 1 dep var.&amp;nbsp; I want to know which ones of the 20 variables are best in predicting the dep var.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 19 Mar 2014 16:51:34 GMT</pubDate>
    <dc:creator>podarum</dc:creator>
    <dc:date>2014-03-19T16:51:34Z</dc:date>
    <item>
      <title>find the best variables to use and best segmentation</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148091#M39159</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P class="visitorText"&gt;What is the best procedure to use if I want to do&lt;/P&gt;&lt;P class="visitorText"&gt;1) Find the best variables to use in a model out of 30 and&lt;/P&gt;&lt;P class="visitorText"&gt;2) Examine the best breaks or cutoffs once I find that variable ?&lt;/P&gt;&lt;P class="visitorText"&gt;&lt;/P&gt;&lt;P class="visitorText"&gt;&lt;SPAN class="visitorName"&gt;For &lt;/SPAN&gt;example a score may be the best default predictor (dep var), and segmented at 300 500 and 650 .. etc.&amp;nbsp; Thanks&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 18 Mar 2014 19:59:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148091#M39159</guid>
      <dc:creator>podarum</dc:creator>
      <dc:date>2014-03-18T19:59:05Z</dc:date>
    </item>
    <item>
      <title>Re: find the best variables to use and best segmentation</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148092#M39160</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am assuming that you've identified variables which will be used as predictors. Proc varclus can identify variables which are loading &lt;SPAN style="line-height: 115%; font-family: 'Calibri','sans-serif'; font-size: 11pt; mso-ascii-theme-font: minor-latin; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-hansi-theme-font: minor-latin; mso-bidi-font-family: 'Times New Roman'; mso-bidi-theme-font: minor-bidi; mso-ansi-language: EN-CA; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;"&gt;heavily&lt;/SPAN&gt; and explaining most of the variation. In that way you may select only some of the variables for further analysis even less than 30. In second phase use kmeans clustering to find best cutt-offs.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 18 Mar 2014 20:17:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148092#M39160</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2014-03-18T20:17:10Z</dc:date>
    </item>
    <item>
      <title>Re: find the best variables to use and best segmentation</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148093#M39161</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you for the response.. this is very helpful.. What do you mean by loading heavily? &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 18 Mar 2014 20:29:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148093#M39161</guid>
      <dc:creator>podarum</dc:creator>
      <dc:date>2014-03-18T20:29:31Z</dc:date>
    </item>
    <item>
      <title>Re: find the best variables to use and best segmentation</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148094#M39162</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;This is a data reduction concept and we try to reduce dimensionality of the data. Proc varclus apply principal components to identify group of variables which are highly correlated within their clusters but least correlated with other groups. Loadings means correlation between variable and the principal components. Please refer to following link for further details.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_varclus_sect001.htm"&gt;http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_varclus_sect001.htm&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 18 Mar 2014 20:45:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148094#M39162</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2014-03-18T20:45:13Z</dc:date>
    </item>
    <item>
      <title>Re: find the best variables to use and best segmentation</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148095#M39163</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I was hoping to find a procedure that finds the best variables that are the most significant to the dependent varaible.&amp;nbsp; If for example I have 20 varaibles and 1 dep var.&amp;nbsp; I want to know which ones of the 20 variables are best in predicting the dep var.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Mar 2014 16:51:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148095#M39163</guid>
      <dc:creator>podarum</dc:creator>
      <dc:date>2014-03-19T16:51:34Z</dc:date>
    </item>
    <item>
      <title>Re: find the best variables to use and best segmentation</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148096#M39164</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Why?&amp;nbsp; You have all 20 measures.&amp;nbsp; Do you mean which variables are most closely correlated with the predicted value?&amp;nbsp; Then you need to consider the role of moderating and mediating variables.&amp;nbsp; Or do you mean which single variable is the best predictor?&amp;nbsp; If so, again I ask, why?&amp;nbsp; If you have all variables available, then to not use them is just, well, ignoring what you do have.&amp;nbsp; Or do you mean which variable (or variables) are the most economical predictors, in the sense of future data?&amp;nbsp; By economical, I mean those that lead to accurate predicted value for the least cost of measurement.&amp;nbsp; I think you are concerned about building a predictive model.&amp;nbsp; If so, subject matter expertise should enter as well as statistical considerations.&amp;nbsp; Parsimony for the sake of parsimony alone will always lead to poor predictive models, just as over complexity can.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Use the methods outlined by @stat@sas above to get started.&amp;nbsp; If you feel some sort of compulsion to try variable selection methods, look at LAR and LASSO methods in GLMSELECT.&amp;nbsp; DO NOT USE STEPWISE, FORWARD, BACKWARD OR ALL POSSIBLE SUBSETS REGRESSION.&amp;nbsp; These have been shown to produce biased results that lead to poor predictive models.&amp;nbsp; Google "Flom Cassell" for more info, or read Frank Harrell's book on regression methods.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve Denham&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Mar 2014 17:42:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148096#M39164</guid>
      <dc:creator>SteveDenham</dc:creator>
      <dc:date>2014-03-19T17:42:35Z</dc:date>
    </item>
    <item>
      <title>Re: find the best variables to use and best segmentation</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148097#M39165</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Great advice.. thanks.. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To answer your questions, I have 20 variables as predictors, (for example time-on-books, FICO score, utilization, location, product, etc.) and 1 response variable (bad or not bad as in defaulted loans)..&amp;nbsp; A business unit has asked me to create a chart of the Response Variable but segmented by the top 3 predictors. For example separate the bads/goods by Location and Product and FICO.&amp;nbsp; It has to be the 3 best significant predictors.&amp;nbsp; Similar to a decision tree.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 19 Mar 2014 17:54:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/find-the-best-variables-to-use-and-best-segmentation/m-p/148097#M39165</guid>
      <dc:creator>podarum</dc:creator>
      <dc:date>2014-03-19T17:54:37Z</dc:date>
    </item>
  </channel>
</rss>

