<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Large value for variation explained using PROC VARCLUS in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Large-value-for-variation-explained-using-PROC-VARCLUS/m-p/112450#M5935</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I would be wary of converting missing data to a valid category. Because VARCLUS omits missing values from the analysis, you are getting a very different result than if you use the following:&lt;/P&gt;&lt;P&gt;1. Binary variable: Y = 1, N = 0, missing data = '.'&lt;/P&gt;&lt;P&gt;2. Categorical: 1, 2 .. n where n is the number of categories. Missing data set to '.'&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 05 Sep 2012 14:57:59 GMT</pubDate>
    <dc:creator>Rick_SAS</dc:creator>
    <dc:date>2012-09-05T14:57:59Z</dc:date>
    <item>
      <title>Large value for variation explained using PROC VARCLUS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Large-value-for-variation-explained-using-PROC-VARCLUS/m-p/112449#M5934</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am relatively new to SAS, having been using it for the last month to do some statistical analysis on some small/medium dataset. I am now working with a much larger dataset (~40000 observations) with around 300 variables. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Of these 300 variables, more than half are not numerical (binary, categorical) so I have created another dataset with the same number of observations but all numerical variables. My conversion rules are as follows:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1. Binary variable: Y = 1, N = 0, missing data = 0&lt;/P&gt;&lt;P&gt;2. Categorical: 0, 1, 2 .. n where n is the number of categories. Missing data set to 0&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I then run PROC VARCLUS on that data with the hope to be able to reduce the number of variables to make a better prediction model:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc varclus data=worktable maxeigen=0.7 outtree=tree maxclusters=2;&lt;/P&gt;&lt;P&gt;var a-z;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;this give me a total variation explained ~ 30&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;increase to maxclusters=3 give me total variation explained ~ 50, increase all the way maxclusters=30 and total variation explained ~ 120&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;i then increase the maxclusters = 40 and it give me total variation explained ~ 130, this total variation explained always goes up. Reading across internet, I found that this values are normally around ~30,40 range and actually goes down if the maxclusters increases more than 10. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am aware that each dataset is unique and different. However, the result I have is quite abnormal. Do you have any suggestion or explanation why the total variation explained I got is so large?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you very much,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thao&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 05 Sep 2012 14:41:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Large-value-for-variation-explained-using-PROC-VARCLUS/m-p/112449#M5934</guid>
      <dc:creator>thaolp</dc:creator>
      <dc:date>2012-09-05T14:41:23Z</dc:date>
    </item>
    <item>
      <title>Re: Large value for variation explained using PROC VARCLUS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Large-value-for-variation-explained-using-PROC-VARCLUS/m-p/112450#M5935</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I would be wary of converting missing data to a valid category. Because VARCLUS omits missing values from the analysis, you are getting a very different result than if you use the following:&lt;/P&gt;&lt;P&gt;1. Binary variable: Y = 1, N = 0, missing data = '.'&lt;/P&gt;&lt;P&gt;2. Categorical: 1, 2 .. n where n is the number of categories. Missing data set to '.'&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 05 Sep 2012 14:57:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Large-value-for-variation-explained-using-PROC-VARCLUS/m-p/112450#M5935</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2012-09-05T14:57:59Z</dc:date>
    </item>
    <item>
      <title>Re: Large value for variation explained using PROC VARCLUS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Large-value-for-variation-explained-using-PROC-VARCLUS/m-p/112451#M5936</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Rick,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For any observation with a missing data variable, if I did not convert it to a value, varclus will not process that observation. My dataset got missing data in almost all variables and thus, if I did not do so, it will say ~39000 observation omitted due to missing data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Do you have a suggestion for this case?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thao&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 05 Sep 2012 20:09:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Large-value-for-variation-explained-using-PROC-VARCLUS/m-p/112451#M5936</guid>
      <dc:creator>thaolp</dc:creator>
      <dc:date>2012-09-05T20:09:03Z</dc:date>
    </item>
  </channel>
</rss>

