BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
noemi_b
Obsidian | Level 7

Hi all,

I am stuck with a problem that I hope you will help me to solve. 

I am using SAS University Edition v.9.4. and I am working on a panel data that includes 162 variables dealing with the transaction volume and behavior of roughly 130000 bank customers. I want to build a predictive model using logistic regression that can forecast the probability to churn customer have. So dependent variable is 1 if Churn and 0 if Not Churn.

Now, I have to build a logistic regression having as dependent variable the dichotomous variable 1 (churn) and 0 ( No Churn). I have to build it according to this definition:
"We define that the customer who did not do any transaction across all of his accounts and product areas on his own initiative during the last three months is a churner"

My problem is: How can I tell to SAS that if variable X is 0 for three following months- i.e. Jan, Feb,Mar- then my variable "Churn" must be 1, otherwise 0? In other words, if customer Z has been active for example from January to June then my dichotomous variable must be 0 during those months. However, if from July to September this customer made no transaction at all then from October my dichotomous variable must be 1. And it must be one until the end of the observation period.

I know I must use a conditional statement to build this new variable, but I do not know how to tell SAS that it must put 1 to that specific customer if and only if he/she has done no transaction across all of his accounts and product areas during the last three months. 

Please note that I have a variable called "trans_numb" that tells me how many transaction the customer has done across all of his accounts and products areas during a specific month. So, I will use this variable as a benchmark for then constructing my dichotomous variable.

I hope I've given you the idea of what I need and hopefully you can help.

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Here's my take:

data have;
input cust_id year month trans_numb;
cards;
1 2017 1 123
1 2017 2 456
1 2017 3 76
1 2017 4 0
1 2017 5 0
1 2017 6 0
1 2017 7 0
1 2017 8 0
1 2017 9 0
1 2017 10 0
1 2017 11 0
1 2017 12 0
2 2017 1 23
2 2017 2 15
2 2017 3 9
2 2017 4 1
2 2017 5 0
2 2017 6 0
2 2017 7 0
2 2017 8 0
2 2017 9 0
2 2017 10 1
2 2017 11 0
2 2017 12 0
3 2017 1 0
3 2017 2 0
3 2017 3 0
3 2017 4 0
3 2017 5 0
3 2017 6 0
3 2017 7 0
3 2017 8 0
3 2017 9 0
3 2017 10 0
3 2017 11 0
3 2017 12 0
4 2017 1 123
4 2017 2 456
4 2017 3 76
4 2017 4 65
4 2017 5 88
4 2017 6 90
4 2017 7 78
4 2017 8 76
4 2017 9 65
4 2017 10 89
4 2017 11 86
4 2017 12 79
5 2017 1 345
5 2017 2 456
5 2017 3 125
5 2017 4 65
5 2017 5 78
5 2017 6 100
5 2017 7 76
5 2017 8 78
5 2017 9 98
5 2017 10 99
5 2017 11 65
5 2017 12 45
6 2017 1 0
6 2017 2 0
6 2017 3 2
6 2017 4 0
6 2017 5 0
6 2017 6 0
6 2017 7 1
6 2017 8 0
6 2017 9 0
6 2017 10 0
6 2017 11 0
6 2017 12 0
7 2017 1 45
7 2017 2 36
7 2017 3 10
7 2017 4 7
7 2017 5 0
7 2017 6 0
7 2017 7 0
7 2017 8 0
7 2017 9 0
7 2017 10 0
7 2017 11 0
7 2017 12 0
;
run;

data want1; /* "simple" method */
set have;
by cust_id year month;
retain cust_count old_churn;
if first.cust_id
then cust_count = 1;
else cust_count + 1;
trans1 = lag(trans_numb);
trans2 = lag2(trans_numb);
if cust_count < 3
then churn = 0;
else do;
  if trans1 or trans2 or trans_numb
  then churn = 0;
  else churn = 1;
end;
old_churn = churn;
* drop old_churn cust_count trans1 trans2;
run;

proc print data=want1 noobs;
run;

data want2; /* "complex" method */
set have;
by cust_id year month;
retain cust_count old_churn;
if first.cust_id
then cust_count = 1;
else cust_count + 1;
trans1 = lag(trans_numb);
trans2 = lag2(trans_numb);
if cust_count < 3
then churn = 0;
else do;
  if not trans1 and not trans2 and not trans_numb
  then churn = 1;
  else if trans1 and trans2 and trans_numb
  then churn = 0;
  else if old_churn
  then churn = 1;
  else churn = 0;
end;
old_churn = churn;
* drop old_churn cust_count trans1 trans2;
run;

proc print data=want2 noobs;
run;

Note that I named the first dataset "have"; I'm so used to name datasets in examples have and want that I keep it that way. 😉

Most of the logic is done by using the lag() functions and a counter for the current customer ID. Adding year and month in the by statement is done to make sure that the dataset is correctly sorted (if not, the step throws an ERROR).

 

Once the code runs to your satisfaction, activate the drop statement to get rid of the helper variables.

View solution in original post

10 REPLIES 10
noemi_b
Obsidian | Level 7

Thank you for the kind and prompt reply @Kurt_Bremser! I apologize for not having provided an example of my dataset, but I'm a newbie in SAS community and I still have to learn how to properly ask questions. However, my dataset looks like the example you provided in your second comment. 

The objective of my work is to develop a predictive model that can help a bank to detectlikely churners before they leave in order to finally retain them. In my dataset, the clients associated to a transaction volume of 0 for more than three-months usually make only one or two transactions in a random subsequent month. And not all of them do it- most of them simply keep doing no transactions. If a customer only has one bank, it is highly uncommon that he makes no or very few transactions across accounts or financial products in an extended timeframe- i.e. 3-month-. For this reason, I would still regard them as being prone to churn. So I would say that if a customer makes a transaction after a three-month pause the churn variable should still be kept at 1.

 

However, if I'm not asking too much, can you please provide the SAS code for both scenarios? 


1- if customer makes a transaction after a three-month pause AND churn is kept 1

AND

2-  if customer makes a transaction after a three-month pause AND churn is reset to 0

Thank you a lot!!

noemi_b
Obsidian | Level 7

@Kurt_Bremser Unfortunately, I cannot post the full data step of my dataset due to size and confidentiality matters. However, below a very small data step of six customers that includes all the possible scenarios I have in my full dataset:

 

data bank;
input cust_id year month trans_numb;
cards;
1 2017 1 123
1 2017 2 456
1 2017 3 76
1 2017 4 0
1 2017 5 0
1 2017 6 0
1 2017 7 0
1 2017 8 0
1 2017 9 0
1 2017 10 0
1 2017 11 0
1 2017 12 0
2 2017 1 23
2 2017 2 15
2 2017 3 9
2 2017 4 1
2 2017 5 0
2 2017 6 0
2 2017 7 0
2 2017 8 0
2 2017 9 0
2 2017 10 1
2 2017 11 0
2 2017 12 0
3 2017 1 0
3 2017 2 0
3 2017 3 0
3 2017 4 0
3 2017 5 0
3 2017 6 0
3 2017 7 0
3 2017 8 0
3 2017 9 0
3 2017 10 0
3 2017 11 0
3 2017 12 0
4 2017 1 123
4 2017 2 456
4 2017 3 76
4 2017 4 65
4 2017 5 88
4 2017 6 90
4 2017 7 78
4 2017 8 76
4 2017 9 65
4 2017 10 89
4 2017 11 86
4 2017 12 79
5 2017 1 345
5 2017 2 456
5 2017 3 125
5 2017 4 65
5 2017 5 78
5 2017 6 100
5 2017 7 76
5 2017 8 78
5 2017 9 98
5 2017 10 99
5 2017 11 65
5 2017 12 45
6 2017 1 0
6 2017 2 0
6 2017 3 2
6 2017 4 0
6 2017 5 0
6 2017 6 0
6 2017 7 1
6 2017 8 0
6 2017 9 0
6 2017 10 0
6 2017 11 0
6 2017 12 0
7 2017 1 45
7 2017 2 36
7 2017 3 10
7 2017 4 7
7 2017 5 0
7 2017 6 0
7 2017 7 0
7 2017 8 0
7 2017 9 0
7 2017 10 0
7 2017 11 0
7 2017 12 0
;
run;

 

As you can see, I have four types of customers:

1- Customers that simply remain active for the whole period (see cust_id 4). In this case, the new variable churn should be 0 for the whole observation period

2- Customers that first are active and then stop doing transactions until the end of the observation period (see cust_id 7). In this case, the new variable churn should first be 0, then it should turn into 1 after a three-month pause.

3- Customers that do not do any transactions for the whole observation period (See cust_id 3). In this case, the variable churn should be 1 after the three-month pause.

4- Customers that are first active then take a three-month pause and then do 1 or 2 transactions on a random month and then return inactive (See cust_id 2). In this case if you can give me two options. 1st option--> the variable churn remains 1; 2nd option--> the variable churn reset and return being 1 again after the second three-month pause.

I hope I've given you an idea you can play around with. 

Thank you so much!!!

Kurt_Bremser
Super User

Here's my take:

data have;
input cust_id year month trans_numb;
cards;
1 2017 1 123
1 2017 2 456
1 2017 3 76
1 2017 4 0
1 2017 5 0
1 2017 6 0
1 2017 7 0
1 2017 8 0
1 2017 9 0
1 2017 10 0
1 2017 11 0
1 2017 12 0
2 2017 1 23
2 2017 2 15
2 2017 3 9
2 2017 4 1
2 2017 5 0
2 2017 6 0
2 2017 7 0
2 2017 8 0
2 2017 9 0
2 2017 10 1
2 2017 11 0
2 2017 12 0
3 2017 1 0
3 2017 2 0
3 2017 3 0
3 2017 4 0
3 2017 5 0
3 2017 6 0
3 2017 7 0
3 2017 8 0
3 2017 9 0
3 2017 10 0
3 2017 11 0
3 2017 12 0
4 2017 1 123
4 2017 2 456
4 2017 3 76
4 2017 4 65
4 2017 5 88
4 2017 6 90
4 2017 7 78
4 2017 8 76
4 2017 9 65
4 2017 10 89
4 2017 11 86
4 2017 12 79
5 2017 1 345
5 2017 2 456
5 2017 3 125
5 2017 4 65
5 2017 5 78
5 2017 6 100
5 2017 7 76
5 2017 8 78
5 2017 9 98
5 2017 10 99
5 2017 11 65
5 2017 12 45
6 2017 1 0
6 2017 2 0
6 2017 3 2
6 2017 4 0
6 2017 5 0
6 2017 6 0
6 2017 7 1
6 2017 8 0
6 2017 9 0
6 2017 10 0
6 2017 11 0
6 2017 12 0
7 2017 1 45
7 2017 2 36
7 2017 3 10
7 2017 4 7
7 2017 5 0
7 2017 6 0
7 2017 7 0
7 2017 8 0
7 2017 9 0
7 2017 10 0
7 2017 11 0
7 2017 12 0
;
run;

data want1; /* "simple" method */
set have;
by cust_id year month;
retain cust_count old_churn;
if first.cust_id
then cust_count = 1;
else cust_count + 1;
trans1 = lag(trans_numb);
trans2 = lag2(trans_numb);
if cust_count < 3
then churn = 0;
else do;
  if trans1 or trans2 or trans_numb
  then churn = 0;
  else churn = 1;
end;
old_churn = churn;
* drop old_churn cust_count trans1 trans2;
run;

proc print data=want1 noobs;
run;

data want2; /* "complex" method */
set have;
by cust_id year month;
retain cust_count old_churn;
if first.cust_id
then cust_count = 1;
else cust_count + 1;
trans1 = lag(trans_numb);
trans2 = lag2(trans_numb);
if cust_count < 3
then churn = 0;
else do;
  if not trans1 and not trans2 and not trans_numb
  then churn = 1;
  else if trans1 and trans2 and trans_numb
  then churn = 0;
  else if old_churn
  then churn = 1;
  else churn = 0;
end;
old_churn = churn;
* drop old_churn cust_count trans1 trans2;
run;

proc print data=want2 noobs;
run;

Note that I named the first dataset "have"; I'm so used to name datasets in examples have and want that I keep it that way. 😉

Most of the logic is done by using the lag() functions and a counter for the current customer ID. Adding year and month in the by statement is done to make sure that the dataset is correctly sorted (if not, the step throws an ERROR).

 

Once the code runs to your satisfaction, activate the drop statement to get rid of the helper variables.

noemi_b
Obsidian | Level 7

@Kurt_Bremser I can't thank you enough!! You're my savior Smiley Tongue

noemi_b
Obsidian | Level 7

Dear @Kurt_Bremser,

Apologies for partially reopening this post but I need your precious help once more. In the codes your created, the variable churn is equal to 1 if the customer has done no transactions (so trans_numb=0) during the last three months. And this is right!

 

However, I initially told you that if i.e. customer 1 has done no transations across all his accounts during the months January, February and March then the variable churn should start being one from the month March. Now, I need the following code edit: 

"if i.e. customer 1 has done no transations across all his accounts during the months January, February and March then the variable churn should start being 1 NOT from the month March but from January".

 

Following a small datastep that describes how I would like my variable churn to look like:

 

data have;
input cust_id year month trans_numb churn;
cards;
1 2017 1 123 0
1 2017 2 456 0
1 2017 3 76 0
1 2017 4 0 1
1 2017 5 0 1
1 2017 6 0 1
1 2017 7 0 1
1 2017 8 0 1
1 2017 9 0 1
1 2017 10 0 1
1 2017 11 0 1
1 2017 12 0 1
2 2017 1 23
2 2017 2 15
2 2017 3 9
2 2017 4 1
2 2017 5 0 1
2 2017 6 0 1
2 2017 7 0 1
2 2017 8 0 1
2 2017 9 0 1
2 2017 10 1 0
2 2017 11 0 0
2 2017 12 0 0
3 2017 1 2 0
3 2017 2 0 0
3 2017 3 0 0
3 2017 4 1 0
3 2017 5 0 1
3 2017 6 0 1
3 2017 7 0 1
3 2017 8 0 1
3 2017 9 0 1
3 2017 10 2 0
3 2017 11 0 0
3 2017 12 0 0
4 2017 1 123 0
4 2017 2 456 0
4 2017 3 76 0
4 2017 4 65 0
4 2017 5 88 0
4 2017 6 90 0
4 2017 7 78 0
4 2017 8 76 0
4 2017 9 65 0
4 2017 10 0 0
4 2017 11 85 0
4 2017 12 79 0
5 2017 1 345 0
5 2017 2 456 0
5 2017 3 125 0
5 2017 4 65 0
5 2017 5 78 0
5 2017 6 100 0
5 2017 7 76 0
5 2017 8 32 0
5 2017 9 15 0
5 2017 10 0 1
5 2017 11 0 1
5 2017 12 0 1
;
run;

Can you help me again? If you have any further questions do not hesitate to ask!

Thank you a lot in advance!!! 

Kurt_Bremser
Super User

Since we can't "go back" in a current data step, there's usually two solutions: reverse sort the dataset so you can again use the lag() functions, or keep a flag in an extra dataset and remerge that.

I selected the second option:

data have;
input cust_id year month trans_numb;
cards;
1 2017 1 123
1 2017 2 456
1 2017 3 76
1 2017 4 0
1 2017 5 0
1 2017 6 0
1 2017 7 0
1 2017 8 0
1 2017 9 0
1 2017 10 0
1 2017 11 0
1 2017 12 0
2 2017 1 23
2 2017 2 15
2 2017 3 9
2 2017 4 1
2 2017 5 0
2 2017 6 0
2 2017 7 0
2 2017 8 0
2 2017 9 0
2 2017 10 1
2 2017 11 0
2 2017 12 0
3 2017 1 0
3 2017 2 0
3 2017 3 0
3 2017 4 0
3 2017 5 0
3 2017 6 0
3 2017 7 0
3 2017 8 0
3 2017 9 0
3 2017 10 0
3 2017 11 0
3 2017 12 0
4 2017 1 123
4 2017 2 456
4 2017 3 76
4 2017 4 65
4 2017 5 88
4 2017 6 90
4 2017 7 78
4 2017 8 76
4 2017 9 65
4 2017 10 89
4 2017 11 86
4 2017 12 79
5 2017 1 345
5 2017 2 456
5 2017 3 125
5 2017 4 65
5 2017 5 78
5 2017 6 100
5 2017 7 76
5 2017 8 78
5 2017 9 98
5 2017 10 99
5 2017 11 65
5 2017 12 45
6 2017 1 0
6 2017 2 0
6 2017 3 2
6 2017 4 0
6 2017 5 0
6 2017 6 0
6 2017 7 1
6 2017 8 0
6 2017 9 0
6 2017 10 0
6 2017 11 0
6 2017 12 0
7 2017 1 45
7 2017 2 36
7 2017 3 10
7 2017 4 7
7 2017 5 0
7 2017 6 0
7 2017 7 0
7 2017 8 0
7 2017 9 0
7 2017 10 0
7 2017 11 0
7 2017 12 0
;
run;

data want2 start_ctl (keep=cust_id); /* "complex" method */
set have;
by cust_id year month;
retain cust_count old_churn;
if first.cust_id
then cust_count = 1;
else cust_count + 1;
trans1 = lag(trans_numb);
trans2 = lag2(trans_numb);
if cust_count < 3
then churn = 0;
else do;
  if not trans1 and not trans2 and not trans_numb
  then do;
    churn = 1;
    if cust_count = 3 then output start_ctl;
  end;
  else if trans1 and trans2 and trans_numb
  then churn = 0;
  else if old_churn
  then churn = 1;
  else churn = 0;
end;
old_churn = churn;
drop old_churn cust_count trans1 trans2;
output want2;
run;

data want;
merge
  want2 (in=a)
  start_ctl (in=b)
;
by cust_id;
if a;
if first.cust_id
then counter = 1;
else counter + 1;
if b and counter < 3 then churn = 1;
drop counter;
run;

proc print data=want noobs;
run;
Kurt_Bremser
Super User

PS consider supplying some example data in a form like this:

data have;
input cust_id year month trans_numb;
cards;
1 2017 1 123
1 2017 2 456
1 2017 3 76
1 2017 4 0
1 2017 5 0
1 2017 6 0
;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3184 views
  • 4 likes
  • 2 in conversation