<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Variable distribution and diversity measure for selection in SAS Data Science</title>
    <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191636#M2395</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Reeza,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have used proc freq with nlevels and ok it gives me an indication which at this point is i would say ok.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I agree this is not about procs maybe something that could be set up by coding.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I will keep you updated on the matter via this post, my thought was something like entropy weights which give an indication of diversity within a variable but maybe i am assuming that for wrong types of variables.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 25 Apr 2014 15:59:08 GMT</pubDate>
    <dc:creator>chemicalab</dc:creator>
    <dc:date>2014-04-25T15:59:08Z</dc:date>
    <item>
      <title>Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191634#M2393</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am trying to find an alternative way of checking the diversity in values of a variable. My goal is to set a measurement or weight that will indicate how well the variable is differentiated in its values and isn't characterized of lets 70% or 80 % of the same value. (another example would that variable X has 503 distinct values in 2000obs which i guess is good)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My goal is to select based on that measure variables for segmentation modeling , cause i believe they can discriminate my data well.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am looking for something besides Proc univariate, means for stats / Varclus or PCA for variable selection, any idea?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you in advance&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 24 Apr 2014 08:38:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191634#M2393</guid>
      <dc:creator>chemicalab</dc:creator>
      <dc:date>2014-04-24T08:38:24Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191635#M2394</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;NLEVELS option in proc freq will tell you how many distinct values per variable.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It sounds like you're looking for a method more than a proc to me, at first glance, ie a uniqueness measure.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Not having any science behind this, I'd consider looking at percent of unique values, ie 503/2000 is about 25% uniqueness, assuming equal distribution which is unlikely.&lt;/P&gt;&lt;P&gt;I have to do this in a few weeks for something I'm working on, so if you find something else that works please post back!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 14:50:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191635#M2394</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-04-25T14:50:00Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191636#M2395</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Reeza,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have used proc freq with nlevels and ok it gives me an indication which at this point is i would say ok.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I agree this is not about procs maybe something that could be set up by coding.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I will keep you updated on the matter via this post, my thought was something like entropy weights which give an indication of diversity within a variable but maybe i am assuming that for wrong types of variables.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 15:59:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191636#M2395</guid>
      <dc:creator>chemicalab</dc:creator>
      <dc:date>2014-04-25T15:59:08Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191637#M2396</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;For segmentation you need variables which have more variability as well as uncorrelated. Otherwise solution will not converge.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 16:35:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191637#M2396</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2014-04-25T16:35:30Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191638#M2397</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;eMiner does some of this automatically as do a lot of the auto datamining software. My plan was to look into their classification methods and decide how I wanted to do mine &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 16:53:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191638#M2397</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-04-25T16:53:49Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191639#M2398</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;?????&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 17:39:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191639#M2398</guid>
      <dc:creator>chemicalab</dc:creator>
      <dc:date>2014-04-25T17:39:05Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191640#M2399</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Sounds like a plan, but i will try something in coding, i dont trust Eminer so much &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 17:41:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191640#M2399</guid>
      <dc:creator>chemicalab</dc:creator>
      <dc:date>2014-04-25T17:41:26Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191641#M2400</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Whats not to trust about eMiner?&lt;/P&gt;&lt;P&gt;It's not really a black box tool and definitely requires user experience in both the tool and statistical methods. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 17:51:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191641#M2400</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-04-25T17:51:12Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191642#M2401</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have noticed many bugs and wrongs in Eminer computations, its mostly good to use when the analytical record is set and ready after coding, kinda to use it for predictive modeling (and model comparison) or segmentation, time efficiency is what it offers mainly&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 19:17:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191642#M2401</guid>
      <dc:creator>chemicalab</dc:creator>
      <dc:date>2014-04-25T19:17:15Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191643#M2402</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Can you expand on the bugs/wrongs? I'm getting ready to use eMiner for a large, important project and am highly interested if there's a reason I shouldn't be. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 19:20:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191643#M2402</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-04-25T19:20:57Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191644#M2403</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Depends, what type of project is it?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 19:22:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191644#M2403</guid>
      <dc:creator>chemicalab</dc:creator>
      <dc:date>2014-04-25T19:22:52Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191645#M2404</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Fraud detection is the general purpose.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 19:24:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191645#M2404</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-04-25T19:24:32Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191646#M2405</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Wow, great post everyone, very lively today!&amp;nbsp; With respect to any bugs or issues with Enterprise Miner, SAS Tech Support is a great resource for troubleshooting.&amp;nbsp; With respect to coding inside EM, there are many options to customize your flows, including the Code Node, Transformations node, etc...&amp;nbsp; As Reeza stated earlier, some of the finer features may require training and experience.&amp;nbsp; There are many, many macros and macro variables available to add to your coding experience.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Jonathan&lt;/P&gt;&lt;P&gt;Product Manager - SAS Enterprise Miner&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 19:25:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191646#M2405</guid>
      <dc:creator>jwexler</dc:creator>
      <dc:date>2014-04-25T19:25:04Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191647#M2406</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Chemicalab,&lt;/P&gt;&lt;P&gt;If you think there are bugs in your system, talk to your SAS Admin or to Tech Support.&lt;BR /&gt;Make sure that the hot fixes you need have been applied. Feel free to google about your EM version, e.g. google "SAS Enterprise Miner 12.1 Hot Fix" and see what is there.&lt;BR /&gt;We find the bugs first than our customers for the most part &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck!&lt;BR /&gt;-Miguel&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 19:32:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191647#M2406</guid>
      <dc:creator>M_Maldonado</dc:creator>
      <dc:date>2014-04-25T19:32:52Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191648#M2407</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Will make sure to do that, regarding the initial question of this post, Reeza i will get back to you on the diversity measure i think i am on to something but requires some coding, it will be working towards the entropy weights i mentioned earlier, will let you know on it&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2014 19:39:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191648#M2407</guid>
      <dc:creator>chemicalab</dc:creator>
      <dc:date>2014-04-25T19:39:30Z</dc:date>
    </item>
    <item>
      <title>Re: Variable distribution and diversity measure for selection</title>
      <link>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191649#M2408</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;"My goal is to set a measurement or weight that will indicate how well the variable is differentiated in its values and isn't characterized of lets 70% or 80 % of the same value."&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are these not variables with platykurtic distributions?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://en.wikipedia.org/wiki/Kurtosis" title="http://en.wikipedia.org/wiki/Kurtosis"&gt;Kurtosis - Wikipedia, the free encyclopedia&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;Kurtosis is calculated by many SAS procedures, the most efficient of which would be PROC HPDMDB. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/prochp/66704/PDF/default/prochp.pdf" title="http://support.sas.com/documentation/cdl/en/prochp/66704/PDF/default/prochp.pdf"&gt;http://support.sas.com/documentation/cdl/en/prochp/66704/PDF/default/prochp.pdf&lt;/A&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;"&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;I have noticed many bugs and wrongs in Eminer computations&lt;/SPAN&gt;" - like what? I am not only a developer for Enterprise Miner - I am also a certified Predictive Modeler for Enterprise Miner and I use it nearly everyday for very complex tasks. While I occasionally see things that I think are bugs, it's nearly always something I just don't understand. Enterprise Miner is tested for functionality and numerical validity year-round by a team of professional statisticians, data miners, programmers and testers all over the world. It is extremely unlikely that such a team could miss "&lt;/SPAN&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;many bugs and wrongs in Eminer computations&lt;/SPAN&gt;". However, i&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;f you truly think there are problems, these need to be reported to SAS technical support immediately. &lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 27 Apr 2014 15:07:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Science/Variable-distribution-and-diversity-measure-for-selection/m-p/191649#M2408</guid>
      <dc:creator>PatrickHall</dc:creator>
      <dc:date>2014-04-27T15:07:36Z</dc:date>
    </item>
  </channel>
</rss>

