Programming the statistical procedures from SAS

Measure correlation between binary variables in classification task

Reply
New Contributor
Posts: 3

Measure correlation between binary variables in classification task

Hello everyone,

I have always found very useful the sas community. For the first time, I have not found what I am looking for, therefore, here I am posting for the first time. Smiley Happy

 

I am working on a classification task for marketing using enterprise miner (last version). I have 30 variables and I must predict whether the customer will accept or refuse our next direct marketing offer.

 

Besided the target variable, socio-demographic, and firmographic variables, I have 5 binary variables. Each of these binary variables represent whether the customer responded to the previous marketing offers (from campaign 1 to campaign 5)

 

What I want is to understand the correlation among such five binary variables and, eventually, the worth of such binary vector in predicting the target variable.

 

After some research, I discovered the best candidate are the Phi (using the PROC CORR PEARSON on binary variables) and the Tethracoric correlation (special case of polychoric correlation for binary variables).

 

I discovered that with the latter correlation measure, I obtain a much higher correlation compared to the Phi. Do you know why?

 

In this context, what is from your experience the best correlation measure?

 

Thank you very much and enjoy your Easter.

Esteemed Advisor
Posts: 7,056

Re: Measure correlation between binary variables in classification task

Here is a short article on the topic: http://www.john-uebersax.com/stat/tetra.htm

 

Main questions you would have to justify are regarding the assumptions.

 

Art, CEO, AnalystFinder.com

 

New Contributor
Posts: 3

Re: Measure correlation between binary variables in classification task

thank you.

 

Therefore, the safest approach would be to use the Phi?

Esteemed Advisor
Posts: 7,056

Re: Measure correlation between binary variables in classification task

[ Edited ]

I'm not a statistician so can't really provide defensible advice. I personally would use phi or, if I'm trying to predict based on those variables (which I think you said was the task), logistic regression.

 

Art, CEO, AnalystFinder.com

 

New Contributor
Posts: 3

Re: Measure correlation between binary variables in classification task

Thank you for your replies.

 

The final objective of this step is to create a new variable to include in the predictive model and, therefore, drop such 5 binary.

 

However, I was looking for a technique to find the weight to assign to each variable in order to create the new one.

Esteemed Advisor
Posts: 7,056

Re: Measure correlation between binary variables in classification task

Sure sounds to me like a task for PROC LOGISTIC. Take a look at: https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_logistic_se...

 

Art, CEO, AnalystFinder.com

 

Grand Advisor
Posts: 9,447

Re: Measure correlation between binary variables in classification task

also check PROC DISTANCE which can calculate the distance between category variables.

Ask a Question
Discussion stats
  • 6 replies
  • 104 views
  • 0 likes
  • 3 in conversation