BookmarkSubscribeRSS Feed
noemi_b
Obsidian | Level 7

Hi all,

 

I am building a churn predictive model using logistic regression. My dataset is an unbalanced panel data that reports the behavior across time of the 350.000 customers a retail bank has. Now, my doubts concern how SAS treats unbalanced panel data when running a logistic regression. 

Can an unbalanced panel data create issues when running the PROC LOGISTIC statement? 

So far, I removed and/or imputed missing values, detected outliers and removed multicollinearity. Now, I am ready to start building my model. However, I am afraid that an unbalanced panel data will create problems when SAS will analyze it using PROC LOGISTIC.

Can you please explain me better how SAS treats an unbalanced panel data with the PROC LOGISTIC?

Thank you in advance.

2 REPLIES 2
Ksharp
Super User

Maybe you should take a look at General Linear Mixed Model.

PROC GLIMMIX

StatDave
SAS Super FREQ

Since you say it is panel data, I assume that you have repeated binary responses over time from each customer.  Observations within one customer are likely correlated and the correlation should be taken into account in the analysis.  Probably the most common approach for this is a Generalized Estimating Equations (GEE) logistic model which can be done using PROC GEE or PROC GENMOD.  Use the REPEATED statement in either of these, specifying a variable in the SUBJECT= option that has a unique value for each customer.  The data set should have multiple observations per customer.  Specify the DIST=BINOMIAL option in the MODEL statement to fit a GEE logistic model.  Another approach is a conditional logistic model which can be fit in PROC LOGISTIC using your customer variable in the STRATA statement. Either approach allows unequal numbers of responses per customer. Some discussion of these models can be found in:

"Categorical Data Analysis Using SAS, Third Edition" (Stokes, M. et. al., SAS Institute, 2012)

"Logistic Regression Using SAS: Theory and Application, Second Edition," (Allison, P., SAS Institute, 2012)

"Fixed Effects Regression Methods for Longitudinal Data Using SAS"  (Allison, P., SAS Institute, 2005)

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 3707 views
  • 2 likes
  • 3 in conversation