Solved: Logistic regression with Longitudinal Data

noemi_b · Posted 07-23-2017 04:34 AM

Hi all,

I need some of your precious insights for my Master Thesis and hopefully you will help.

I am trying to build a churn predictive model for a retail bank and I would like to use regression analysis for doing it. In particular, I would like to use the logit to achieve my goal.

My dataset is a panel data with 135000 customers whose behavior was tracked during an observation period of T=17 months. I have 50 variables in the dataset. 22 variables are dummies which describe whether or not the client has a give financial product or account as well as whether or not the client has done any debit or credit transaction on them. Then, the rest are demographic variable plus continous and descrete variables that describe how the client's earnings and financial position and credit risk have changed over time.

Now, I have finished with the preprocessing phase. I have formatted variables, imputed missing values, delimited my dataset by age in order to eliminate the involuntary churners and created a target variable called "churn".

My doubts now are:

1) How can I detect outliers? Does the following code work also for longitudinal data?

proc reg data=have;
title "ABC Outliers";
model churn=Customer_id_Ano
Month
family_id_Ano
Earning_Cust12M
Earning_Fam12M
customer_status_Numeric
account_basic_sum
csi
customer_trans_count
logons
meeting_n
overdraft_facility_sum
volume_investments
volume_loans
volume_pension_lifeins
volume_savings
LTV_current
Teller_trans_3m_last
customer_age
first_account_open
account_save_debit_transaction
account_save_credit_transaction
account_save
account_savings_volume
basic_bank_debit_transaction
basic_bank_credit_transaction
credit_card_debit_transaction
credit_card_credit_transaction
loan_home
loan_home_volume
meeting_with_client
atm_transaction
dialogue_with_advisor
secure_message_sent
teller_dialogue
basic_banking
day_to_day_finance
ebank_0
home_finance
insurance
investments
pension
personal_lending
savings
num_adults
num_kids
customer_gender
;
output out=want (keep= churn Customer_id_Ano
Month
family_id_Ano
Earning_Cust12M
Earning_Fam12M
customer_status_Numeric
account_basic_sum
csi
customer_trans_count
logons
meeting_n
overdraft_facility_sum
volume_investments
volume_loans
volume_pension_lifeins
volume_savings
LTV_current
Teller_trans_3m_last
customer_age
first_account_open
account_save_debit_transaction
account_save_credit_transaction
account_save
account_savings_volume
basic_bank_debit_transaction
basic_bank_credit_transaction
credit_card_debit_transaction
credit_card_credit_transaction
loan_home
loan_home_volume
meeting_with_client
atm_transaction
dialogue_with_advisor
secure_message_sent
teller_dialogue
basic_banking
day_to_day_finance
ebank_0
home_finance
insurance
investments
pension
personal_lending
savings
num_adults
num_kids
customer_gender r) rstudent=r;
run;
quit; 

data new;
set want;
if abs(r)>2 then delete;
run;

proc univariate data=want plots plotsize=30;
var r;
run;

2) How can I check for variables multicollinearity with a longitudinal data and logistic regression? Can I use the VIF model?

3) Shall I split the dataset into Train and Validation? If yes, what is the code I can use when my dataset is a panel data? I must be sure that when SAS splits the dataset into train and validation it keeps untouched the panel data format of my dataset.

4) What procedure shall I use in SAS to run the logistic regression? I do not think thet PROC LOGISTIC is the right choice as it does not take into account the correlation between the 17 observations within a subject. Is the PROC GLIMMIX the one I might need?

I know I've addressed you many questions, but hopefully you'll be able to help.

Thank you in advance!

sbxkoenk · Posted 07-23-2017 09:00 AM

Hello,

You can use PROC GLIMMIX indeed.

But a good alternative is using PROC LOGISTIC to construct a “multinomial discrete-time logistic hazard regression” (in your case binary instead of multinomial).

This model also allows for time-dependent and time-varying covariates.

See this Enterprise Miner tip:

Tip: Getting Started with Survival Data Mining in SAS® Enterprise Miner™

https://communities.sas.com/t5/SAS-Communities-Library/Tip-Getting-Started-with-Survival-Data-Mining...

Logistic regression with Longitudinal Data

Re: Logistic regression with Longitudinal Data

Re: Logistic regression with Longitudinal Data

Catch up on SAS Innovate 2026