An off-topic spot to chat about your musings of the day

Logistic Regression and Cluster Analysis

Reply
Occasional Contributor
Posts: 14

Logistic Regression and Cluster Analysis

[ Edited ]

As part of my MS in Analytics program, I had an opportunity to discuss about Logistic Regression and Cluster Analysis. I wanted to share it with sas community.

 

Component

Logistic Regression

Cluster Analysis

Typical Application (used when)

Response variables are categorical in nature i.e., binary outcomes 1 or whether something happened or not etc. (e.g., customer did not respond to the sales promotion or they did respond to it)

Characterize data so the data points similar in characteristics are in a cluster and data points that have dissimilar characteristics are in different clusters.

Data Type

Independent variables categorical or continuous

Categorical or continuous

Method

Supervised method (target variable)

Unsupervised method (no target variable)

How

Uses probabilities / odds ratios to characterize parameters

Similarity of data points criterion by Euclidean or Jaccard distance.

Advantages

1.     Robust and flexible method.

2.     Does not assume predictor variable distribution.

3.     Interpretation of data is meaningful when response variable is categorical and predictor variable is of categorical or continuous type.

1.     Cluster analysis is a standalone technique or could be used to further regression analysis.

2.     Used in market segmentation and to partition customers for customized promotions for customers in different clusters.

3.     Advantageous to partition data into clusters when multiple groups exist in the dataset, and present partitioned clusters for executive decision based on problem of interest.

Disadvantages

1.     Needs more data points for meaningful and stable results.

2.     Requires extensive training for analysts to present output to non-technical stakeholders.

1.     No standard method exists to define clusters.

2.     The number of clusters is challenging to determine if the Euclidean or other criteria for distance is close.

 

References:

 

Huang, F. L., & Moon, T. R. (2013). What are the odds of that? A primer on understanding logistic regression. Gifted Child Quarterly, 57(3), 197–204. 

 

Yuhua, F. (2012). Analysis on algorithm and application of cluster in data mining. Journal of Theoretical & Applied Information Technology46(1), 416–419.

 

Pohar, Maja, & Blas, Mateja, & Turk, Sandra (2004) Comparison of Logistic Regression and Linear Discriminant Analysis: A Simulation Study, 143-161.

http://mrvar2.fdv.uni-lj.si/pub/mz/mz1.1/pohar.pdf

Trusted Advisor
Posts: 1,671

Re: Logistic Regression and Cluster Analysis

Logistic Regression can be used when the response variable is not binary; it can be used when there are more than two categories in your response.

 

I disagree with the tone of this statement: "Requires extensive training for analysts to interpret results and potential for error results in failure costs for statistical analysis." Extensive training, maybe and maybe not; the idea of logistic regression and certain outputs from PROC LOGISTIC aren't that hard to grasp.

Occasional Contributor
Posts: 14

Re: Logistic Regression and Cluster Analysis

Thank you PaigeMiller for the corrections. I appreciate it.

 

My thought process was to discuss about the typical application of logistic regression in a categorical response variable such as binary response situations. Also on the extensive training aspect, my thought process was in explaining the output of logistic regression output to non-technical stakeholders.

 

Thank you,

Murali

 

 

Ask a Question
Discussion stats
  • 2 replies
  • 417 views
  • 2 likes
  • 2 in conversation