Hello Everyone,
I am looking to create a sequence variable based on two columns : Cust_name and Year.
There are different type of customers as below:
Customer1 was with me from 2001 to 2003
Customer2 was with me only for 2001 and 2002
Customer3 was only for one year 2001
Customer4 was with me alternate year 2003,2005,2007
Customer5 2001,2002,2005,2006,2007
I want to find maximum number of years I was able to provide service to my customers.
So, Output I am looking is:
Customer1 output as 3 (as he was 2001,2002,2003)
Customer2 as 2 (as he was 2001,2002)
Customer3 as 1 (as he was only for yr 2001)
Customer4 as 1 (alternate yr so no continuous year)
Customer5 as 3 ( he was there continuously 2(2001,2002) and 3(2005,2006,2007) so I will choose max as 3).
PFB dataset for your reference:
data abc;
input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10. ;
format transaction_date mmddyy10.;
datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;
run;
proc print data=abc;
run;
Cust_Name | Year | Output1_Sequence | Final_output(max(Output1 by Cust_name)) |
1 | 2001 | 1 | 3 |
1 | 2002 | 2 | 3 |
1 | 2003 | 3 | 3 |
2 | 2001 | 1 | 2 |
2 | 2002 | 2 | 2 |
3 | 2001 | 1 | 1 |
4 | 2003 | 1 | 1 |
4 | 2005 | 1 | 1 |
4 | 2007 | 1 | 1 |
5 | 2001 | 1 | 3 |
5 | 2002 | 2 | 3 |
5 | 2005 | 1 | 3 |
5 | 2006 | 2 | 3 |
5 | 2007 | 3 | 3 |
Looking forward for the insights.
Thanks in advance!
Below one way that should work for you.
data have;
input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10.;
format transaction_date mmddyy10.;
datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 05/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;
data inter(keep=customer_id max_cnt);
set have;
by customer_id transaction_date;
retain max_cnt;
_lag_dt=lag(transaction_date);
if first.customer_id or intck('year',_lag_dt,transaction_date) > 1 then
do;
_cnt=1;
max_cnt=1;
end;
else
if intck('year',_lag_dt,transaction_date) = 1 then
do;
_cnt+1;
max_cnt=max(_cnt,max_cnt);
end;
if last.customer_id then output;
run;
data want;
length seq_nr 8;
merge have inter;
by customer_id;
if first.customer_id then seq_nr=1;
else seq_nr+1;
run;
proc print data=want;
run;
1. Sort data by customer and date (proc sort)
2. Create a year from your transaction date (year() function)
3. Use PROC SORT or FREQ to remove duplicate years
4. Using a data step, check if the years are continuous (DIF())
5. Assign codes based on results for step #4.
This is the approach I'd use for someone who's relatively new to SAS.
@AkshayS wrote:
Hello Everyone,
I am looking to create a sequence variable based on two columns : Cust_name and Year.
There are different type of customers as below:
Customer1 was with me from 2001 to 2003
Customer2 was with me only for 2001 and 2002
Customer3 was only for one year 2001
Customer4 was with me alternate year 2003,2005,2007
Customer5 2001,2002,2005,2006,2007
I want to find maximum number of years I was able to provide service to my customers.
So, Output I am looking is:
Customer1 output as 3 (as he was 2001,2002,2003)
Customer2 as 2 (as he was 2001,2002)
Customer3 as 1 (as he was only for yr 2001)
Customer4 as 1 (alternate yr so no continuous year)
Customer5 as 3 ( he was there continuously 2(2001,2002) and 3(2005,2006,2007) so I will choose max as 3).
PFB dataset for your reference:
data abc;
input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10. ;
format transaction_date mmddyy10.;
datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;
run;
proc print data=abc;
run;
Cust_Name Year Output1_Sequence Final_output(max(Output1 by Cust_name)) 1 2001 1 3 1 2002 2 3 1 2003 3 3 2 2001 1 2 2 2002 2 2 3 2001 1 1 4 2003 1 1 4 2005 1 1 4 2007 1 1 5 2001 1 3 5 2002 2 3 5 2005 1 3 5 2006 2 3 5 2007 3 3
Looking forward for the insights.
Thanks in advance!
Thanks for responding, but i have follwed same steps but it is not helping as there is different behaviour of customer as shared in example.
How is it not helping? Which cases didn't it consider? Please show your work?
Hi there,
Please find the below approach taken to deal with all the types of customer behaviour in my data.
data abc;
input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10. ;
format transaction_date mmddyy10.;
datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;
run;
proc sql;
create table t1 as
select Customer_name,year(transaction_date) as year
from abc;
quit;
proc sql;
create table t2 as
select Customer_name,year
from t1;
quit;
data t3;
retain customer_name customer year lyear out;
set t2;
keep retain customer_name customer year lyear out;
customer=lag(customer_name);
lyear=lag(year);
out=(customer_name=customer) and (year ne (lyear+1)) ;
proc sql;
create table t4 as
select *,
case when out=0 and customer_name=customer then 1 else 0 end as sequence
from t3;
quit;
data t5;
set t4;
by Customer_name ;
retain sequence1 0 ;
if first.Customer_name then sequence1=1;
else sequence1=sequence;
run;
data t6;
set t5;
by Customer_name ;
retain sequence2 0 ;
if first.Customer_name then sequence2=sequence1;
else sequence2+sequence1;
run;
proc sql;
create table t7 as
select Customer_name,year,
case when out=1 and customer_name=customer and year>lyear+1
then 1 else sequence2 end as sequence3
from t6;
quit;
proc print;
run;
Output :
customer_name | year | sequence3 |
Customer1 | 2001 | 1 |
Customer1 | 2002 | 2 |
Customer1 | 2003 | 3 |
Customer2 | 2001 | 1 |
Customer2 | 2002 | 2 |
Customer3 | 2001 | 1 |
Customer4 | 2003 | 1 |
Customer4 | 2005 | 1 |
Customer4 | 2007 | 1 |
Customer5 | 2001 | 1 |
Customer5 | 2002 | 2 |
Customer5 | 2005 | 1 |
Customer5 | 2006 | 3 |
Customer5 | 2007 | 4 |
The customer behaviour is that he is getting service for 2001,2002 and the he disappears for 2yrs and comes back in 2005 onwards. So the sequence creation is getting disturbed.
customer_name | year | sequence3 |
Customer5 | 2001 | 1 |
Customer5 | 2002 | 2 |
Customer5 | 2005 | 1 |
Customer5 | 2006 | 3 |
Customer5 | 2007 | 4 |
If you help me in fixing the sequence of such customer, my issue will be solved.
Thanks for your guidance. Please correct me if I am wrong at any step.
Another approach, But facing same issue with customer type 5
data abc;
input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10. ;
format transaction_date mmddyy10.;
datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;
run;
proc sql;
create table t1 as
select Customer_name,year(transaction_date) as year
from abc;
quit;
proc sql;
create table t2 as
select Customer_name,year
from t1;
quit;
data t3;
retain customer_name customer year lyear out;
set t2;
keep retain customer_name customer year lyear out;
customer=lag(customer_name);
lyear=lag(year);
out=(customer_name=customer) and (year ne (lyear+1)) ;
data t4;
set t3;
by Customer_name ;
if first.Customer_name then SeqN=1;
else SeqN + 1;
drop Customer ;
proc sql;
create table t5 as
select Customer_name,year,SeqN,out,
case when out=0 then SeqN else out end as sequence
from t4;
quit;
proc print;
run;
Below one way that should work for you.
data have;
input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10.;
format transaction_date mmddyy10.;
datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 05/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;
data inter(keep=customer_id max_cnt);
set have;
by customer_id transaction_date;
retain max_cnt;
_lag_dt=lag(transaction_date);
if first.customer_id or intck('year',_lag_dt,transaction_date) > 1 then
do;
_cnt=1;
max_cnt=1;
end;
else
if intck('year',_lag_dt,transaction_date) = 1 then
do;
_cnt+1;
max_cnt=max(_cnt,max_cnt);
end;
if last.customer_id then output;
run;
data want;
length seq_nr 8;
merge have inter;
by customer_id;
if first.customer_id then seq_nr=1;
else seq_nr+1;
run;
proc print data=want;
run;
Thank you so much Patrick! It is working for all different set of combinations, which I tried more for testing.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.