New Contributor
Posts: 3

# Measure correlation between binary variables in classification task

Hello everyone,

I have always found very useful the sas community. For the first time, I have not found what I am looking for, therefore, here I am posting for the first time.

I am working on a classification task for marketing using enterprise miner (last version). I have 30 variables and I must predict whether the customer will accept or refuse our next direct marketing offer.

Besided the target variable, socio-demographic, and firmographic variables, I have 5 binary variables. Each of these binary variables represent whether the customer responded to the previous marketing offers (from campaign 1 to campaign 5)

What I want is to understand the correlation among such five binary variables and, eventually, the worth of such binary vector in predicting the target variable.

After some research, I discovered the best candidate are the Phi (using the PROC CORR PEARSON on binary variables) and the Tethracoric correlation (special case of polychoric correlation for binary variables).

I discovered that with the latter correlation measure, I obtain a much higher correlation compared to the Phi. Do you know why?

In this context, what is from your experience the best correlation measure?

Thank you very much and enjoy your Easter.

PROC Star
Posts: 7,631

## Re: Measure correlation between binary variables in classification task

Posted in reply to Seymour93

Here is a short article on the topic: http://www.john-uebersax.com/stat/tetra.htm

Main questions you would have to justify are regarding the assumptions.

Art, CEO, AnalystFinder.com

New Contributor
Posts: 3

## Re: Measure correlation between binary variables in classification task

thank you.

Therefore, the safest approach would be to use the Phi?

PROC Star
Posts: 7,631

## Re: Measure correlation between binary variables in classification task

[ Edited ]
Posted in reply to Seymour93

I'm not a statistician so can't really provide defensible advice. I personally would use phi or, if I'm trying to predict based on those variables (which I think you said was the task), logistic regression.

Art, CEO, AnalystFinder.com

New Contributor
Posts: 3

## Re: Measure correlation between binary variables in classification task

Thank you for your replies.

The final objective of this step is to create a new variable to include in the predictive model and, therefore, drop such 5 binary.

However, I was looking for a technique to find the weight to assign to each variable in order to create the new one.

PROC Star
Posts: 7,631

## Re: Measure correlation between binary variables in classification task

Posted in reply to Seymour93

Sure sounds to me like a task for PROC LOGISTIC. Take a look at: https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_logistic_se...

Art, CEO, AnalystFinder.com

Super User
Posts: 10,196

## Re: Measure correlation between binary variables in classification task

Posted in reply to Seymour93
```also check PROC DISTANCE which can calculate the distance between category variables.

```
Discussion stats
• 6 replies
• 192 views
• 0 likes
• 3 in conversation