turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Unbalanced Panel Data and Logistic Regression for ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-18-2017 03:56 PM

Hi all,

I am building a churn predictive model using logistic regression. My dataset is an unbalanced panel data that reports the behavior across time of the 350.000 customers a retail bank has. Now, my doubts concern how SAS treats unbalanced panel data when running a logistic regression.

Can an unbalanced panel data create issues when running the PROC LOGISTIC statement?

So far, I removed and/or imputed missing values, detected outliers and removed multicollinearity. Now, I am ready to start building my model. However, I am afraid that an unbalanced panel data will create problems when SAS will analyze it using PROC LOGISTIC.

Can you please explain me better how SAS treats an unbalanced panel data with the PROC LOGISTIC?

Thank you in advance.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to noemi_b

07-19-2017 09:37 AM

Maybe you should take a look at General Linear Mixed Model.

PROC GLIMMIX

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to noemi_b

07-20-2017 02:24 PM - edited 07-20-2017 02:25 PM

Since you say it is panel data, I assume that you have repeated binary responses over time from each customer. Observations within one customer are likely correlated and the correlation should be taken into account in the analysis. Probably the most common approach for this is a Generalized Estimating Equations (GEE) logistic model which can be done using PROC GEE or PROC GENMOD. Use the REPEATED statement in either of these, specifying a variable in the SUBJECT= option that has a unique value for each customer. The data set should have multiple observations per customer. Specify the DIST=BINOMIAL option in the MODEL statement to fit a GEE logistic model. Another approach is a conditional logistic model which can be fit in PROC LOGISTIC using your customer variable in the STRATA statement. Either approach allows unequal numbers of responses per customer. Some discussion of these models can be found in:

"Categorical Data Analysis Using SAS, Third Edition" (Stokes, M. et. al., SAS Institute, 2012)

"Logistic Regression Using SAS: Theory and Application, Second Edition," (Allison, P., SAS Institute, 2012)

"Fixed Effects Regression Methods for Longitudinal Data Using SAS" (Allison, P., SAS Institute, 2005)