BookmarkSubscribeRSS Feed
Seymour93
Calcite | Level 5

Hello everyone,

I have always found very useful the sas community. For the first time, I have not found what I am looking for, therefore, here I am posting for the first time. 🙂

 

I am working on a classification task for marketing using enterprise miner (last version). I have 30 variables and I must predict whether the customer will accept or refuse our next direct marketing offer.

 

Besided the target variable, socio-demographic, and firmographic variables, I have 5 binary variables. Each of these binary variables represent whether the customer responded to the previous marketing offers (from campaign 1 to campaign 5)

 

What I want is to understand the correlation among such five binary variables and, eventually, the worth of such binary vector in predicting the target variable.

 

After some research, I discovered the best candidate are the Phi (using the PROC CORR PEARSON on binary variables) and the Tethracoric correlation (special case of polychoric correlation for binary variables).

 

I discovered that with the latter correlation measure, I obtain a much higher correlation compared to the Phi. Do you know why?

 

In this context, what is from your experience the best correlation measure?

 

Thank you very much and enjoy your Easter.

6 REPLIES 6
art297
Opal | Level 21

Here is a short article on the topic: http://www.john-uebersax.com/stat/tetra.htm

 

Main questions you would have to justify are regarding the assumptions.

 

Art, CEO, AnalystFinder.com

 

Seymour93
Calcite | Level 5

thank you.

 

Therefore, the safest approach would be to use the Phi?

art297
Opal | Level 21

I'm not a statistician so can't really provide defensible advice. I personally would use phi or, if I'm trying to predict based on those variables (which I think you said was the task), logistic regression.

 

Art, CEO, AnalystFinder.com

 

Seymour93
Calcite | Level 5

Thank you for your replies.

 

The final objective of this step is to create a new variable to include in the predictive model and, therefore, drop such 5 binary.

 

However, I was looking for a technique to find the weight to assign to each variable in order to create the new one.

art297
Opal | Level 21

Sure sounds to me like a task for PROC LOGISTIC. Take a look at: https://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_logistic_se...

 

Art, CEO, AnalystFinder.com

 

Ksharp
Super User
also check PROC DISTANCE which can calculate the distance between category variables.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 3914 views
  • 0 likes
  • 3 in conversation