<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Measure correlation between binary variables in classification task in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350248#M18358</link>
    <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;I have always found very useful the sas community. For the first time, I have not found what I am looking for, therefore, here I am posting for the first time. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working on a classification task for marketing using enterprise miner (last version). I have 30 variables and I must predict whether the customer will accept or refuse our next direct marketing offer.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Besided the target variable, socio-demographic, and firmographic variables, I have &lt;STRONG&gt;5 binary variables&lt;/STRONG&gt;. Each of these binary variables represent whether the customer responded to the previous marketing offers (from campaign 1 to campaign 5)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I want is to understand the correlation among such five binary variables and, eventually, the worth of such binary vector in predicting the target variable.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After some research, I discovered the best candidate are the Phi (using the PROC CORR PEARSON on binary variables) and the Tethracoric correlation (special case of&amp;nbsp;polychoric correlation for binary variables).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I discovered that with the latter correlation measure, I obtain a much higher correlation compared to the Phi. Do you know why?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In this context, what is from your experience the best correlation measure?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you very much and enjoy your Easter.&lt;/P&gt;</description>
    <pubDate>Sat, 15 Apr 2017 11:47:38 GMT</pubDate>
    <dc:creator>Seymour93</dc:creator>
    <dc:date>2017-04-15T11:47:38Z</dc:date>
    <item>
      <title>Measure correlation between binary variables in classification task</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350248#M18358</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;&lt;P&gt;I have always found very useful the sas community. For the first time, I have not found what I am looking for, therefore, here I am posting for the first time. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am working on a classification task for marketing using enterprise miner (last version). I have 30 variables and I must predict whether the customer will accept or refuse our next direct marketing offer.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Besided the target variable, socio-demographic, and firmographic variables, I have &lt;STRONG&gt;5 binary variables&lt;/STRONG&gt;. Each of these binary variables represent whether the customer responded to the previous marketing offers (from campaign 1 to campaign 5)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I want is to understand the correlation among such five binary variables and, eventually, the worth of such binary vector in predicting the target variable.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After some research, I discovered the best candidate are the Phi (using the PROC CORR PEARSON on binary variables) and the Tethracoric correlation (special case of&amp;nbsp;polychoric correlation for binary variables).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I discovered that with the latter correlation measure, I obtain a much higher correlation compared to the Phi. Do you know why?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In this context, what is from your experience the best correlation measure?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you very much and enjoy your Easter.&lt;/P&gt;</description>
      <pubDate>Sat, 15 Apr 2017 11:47:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350248#M18358</guid>
      <dc:creator>Seymour93</dc:creator>
      <dc:date>2017-04-15T11:47:38Z</dc:date>
    </item>
    <item>
      <title>Re: Measure correlation between binary variables in classification task</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350254#M18359</link>
      <description>&lt;P&gt;Here is a short article on the topic:&amp;nbsp;&lt;A href="http://www.john-uebersax.com/stat/tetra.htm" target="_blank"&gt;http://www.john-uebersax.com/stat/tetra.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Main questions you would have to justify are regarding the assumptions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Apr 2017 14:04:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350254#M18359</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2017-04-15T14:04:16Z</dc:date>
    </item>
    <item>
      <title>Re: Measure correlation between binary variables in classification task</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350255#M18360</link>
      <description>&lt;P&gt;thank you.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Therefore, the safest approach would be to use the Phi?&lt;/P&gt;</description>
      <pubDate>Sat, 15 Apr 2017 14:09:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350255#M18360</guid>
      <dc:creator>Seymour93</dc:creator>
      <dc:date>2017-04-15T14:09:32Z</dc:date>
    </item>
    <item>
      <title>Re: Measure correlation between binary variables in classification task</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350257#M18361</link>
      <description>&lt;P&gt;I'm not a statistician so can't really provide defensible advice. I personally would use phi or, if I'm trying to predict based on those variables (which I think you said was the task), logistic regression.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Apr 2017 14:27:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350257#M18361</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2017-04-15T14:27:40Z</dc:date>
    </item>
    <item>
      <title>Re: Measure correlation between binary variables in classification task</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350258#M18362</link>
      <description>&lt;P&gt;Thank you for your replies.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The final objective of this step is to create a new variable to include in the predictive model and, therefore, drop such 5 binary.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I was looking for a technique to find the weight to assign to each variable in order to create the new one.&lt;/P&gt;</description>
      <pubDate>Sat, 15 Apr 2017 14:31:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350258#M18362</guid>
      <dc:creator>Seymour93</dc:creator>
      <dc:date>2017-04-15T14:31:49Z</dc:date>
    </item>
    <item>
      <title>Re: Measure correlation between binary variables in classification task</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350283#M18366</link>
      <description>&lt;P&gt;Sure sounds to me like a task for PROC LOGISTIC. Take a look at:&amp;nbsp;&lt;A href="https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_logistic_sect060.htm" target="_blank"&gt;https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_logistic_sect060.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 15 Apr 2017 16:54:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350283#M18366</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2017-04-15T16:54:32Z</dc:date>
    </item>
    <item>
      <title>Re: Measure correlation between binary variables in classification task</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350336#M18367</link>
      <description>&lt;PRE&gt;
also check PROC DISTANCE which can calculate the distance between category variables.

&lt;/PRE&gt;</description>
      <pubDate>Sun, 16 Apr 2017 02:26:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Measure-correlation-between-binary-variables-in-classification/m-p/350336#M18367</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-04-16T02:26:29Z</dc:date>
    </item>
  </channel>
</rss>

