turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Measure correlation between binary variables in cl...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-15-2017 07:47 AM

Hello everyone,

I have always found very useful the sas community. For the first time, I have not found what I am looking for, therefore, here I am posting for the first time.

I am working on a classification task for marketing using enterprise miner (last version). I have 30 variables and I must predict whether the customer will accept or refuse our next direct marketing offer.

Besided the target variable, socio-demographic, and firmographic variables, I have **5 binary variables**. Each of these binary variables represent whether the customer responded to the previous marketing offers (from campaign 1 to campaign 5)

What I want is to understand the correlation among such five binary variables and, eventually, the worth of such binary vector in predicting the target variable.

After some research, I discovered the best candidate are the Phi (using the PROC CORR PEARSON on binary variables) and the Tethracoric correlation (special case of polychoric correlation for binary variables).

I discovered that with the latter correlation measure, I obtain a much higher correlation compared to the Phi. Do you know why?

In this context, what is from your experience the best correlation measure?

Thank you very much and enjoy your Easter.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-15-2017 10:04 AM

Here is a short article on the topic: http://www.john-uebersax.com/stat/tetra.htm

Main questions you would have to justify are regarding the assumptions.

Art, CEO, AnalystFinder.com

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-15-2017 10:09 AM

thank you.

Therefore, the safest approach would be to use the Phi?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-15-2017 10:26 AM - edited 04-15-2017 10:27 AM

I'm not a statistician so can't really provide defensible advice. I personally would use phi or, if I'm trying to predict based on those variables (which I think you said was the task), logistic regression.

Art, CEO, AnalystFinder.com

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-15-2017 10:31 AM

Thank you for your replies.

The final objective of this step is to create a new variable to include in the predictive model and, therefore, drop such 5 binary.

However, I was looking for a technique to find the weight to assign to each variable in order to create the new one.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-15-2017 12:54 PM

Sure sounds to me like a task for PROC LOGISTIC. Take a look at: https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_logistic_se...

Art, CEO, AnalystFinder.com

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-15-2017 10:26 PM

also check PROC DISTANCE which can calculate the distance between category variables.