BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AkshayS
Fluorite | Level 6

Hello Everyone,

I am looking to create a sequence variable based on two columns : Cust_name and Year.
There are different type of customers as below:
Customer1 was with me from 2001 to 2003
Customer2 was with me only for 2001 and 2002
Customer3 was only for one year 2001
Customer4 was with me alternate year 2003,2005,2007
Customer5 2001,2002,2005,2006,2007

I want to find maximum number of years I was able to provide service to my customers.
So, Output I am looking is:
Customer1 output as 3 (as he was 2001,2002,2003)
Customer2 as 2 (as he was 2001,2002)
Customer3 as 1 (as he was only for yr 2001)
Customer4 as 1 (alternate yr so no continuous year)
Customer5 as 3 ( he was there continuously 2(2001,2002) and 3(2005,2006,2007) so I will choose max as 3).


PFB dataset for your reference:

data abc;
input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10. ;
format transaction_date mmddyy10.;
datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;
run;

proc print data=abc;
run;

 

Cust_NameYearOutput1_SequenceFinal_output(max(Output1 by Cust_name))
1200113
1200223
1200333
2200112
2200222
3200111
4200311
4200511
4200711
5200113
5200223
5200513
5200623
5200733



Looking forward for the insights.
Thanks in advance! 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below one way that should work for you.

data have;
  input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10.;
  format transaction_date mmddyy10.;
  datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 05/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;

data inter(keep=customer_id max_cnt);
  set have;
  by customer_id transaction_date;
  retain max_cnt;
  _lag_dt=lag(transaction_date);

  if first.customer_id or intck('year',_lag_dt,transaction_date) > 1 then 
    do;
      _cnt=1;
      max_cnt=1;
    end;
  else 
  if intck('year',_lag_dt,transaction_date) = 1 then
    do;
      _cnt+1;
      max_cnt=max(_cnt,max_cnt);
    end;

  if last.customer_id then output;
run;

data want;
  length seq_nr 8;
  merge have inter;
  by customer_id;
  if first.customer_id then seq_nr=1;
  else seq_nr+1;
run;

proc print data=want;
run;

Patrick_0-1593305300097.png

 

View solution in original post

7 REPLIES 7
Reeza
Super User

1. Sort data by customer and date (proc sort)
2. Create a year from your transaction date (year() function)
3. Use PROC SORT or FREQ to remove duplicate years
4. Using a data step, check if the years are continuous (DIF())
5. Assign codes based on results for step #4.

This is the approach I'd use for someone who's relatively new to SAS.

 

Spoiler

@AkshayS wrote:

Hello Everyone,

I am looking to create a sequence variable based on two columns : Cust_name and Year.
There are different type of customers as below:
Customer1 was with me from 2001 to 2003
Customer2 was with me only for 2001 and 2002
Customer3 was only for one year 2001
Customer4 was with me alternate year 2003,2005,2007
Customer5 2001,2002,2005,2006,2007

I want to find maximum number of years I was able to provide service to my customers.
So, Output I am looking is:
Customer1 output as 3 (as he was 2001,2002,2003)
Customer2 as 2 (as he was 2001,2002)
Customer3 as 1 (as he was only for yr 2001)
Customer4 as 1 (alternate yr so no continuous year)
Customer5 as 3 ( he was there continuously 2(2001,2002) and 3(2005,2006,2007) so I will choose max as 3).


PFB dataset for your reference:

data abc;
input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10. ;
format transaction_date mmddyy10.;
datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;
run;

proc print data=abc;
run;

 

Cust_Name Year Output1_Sequence Final_output(max(Output1 by Cust_name))
1 2001 1 3
1 2002 2 3
1 2003 3 3
2 2001 1 2
2 2002 2 2
3 2001 1 1
4 2003 1 1
4 2005 1 1
4 2007 1 1
5 2001 1 3
5 2002 2 3
5 2005 1 3
5 2006 2 3
5 2007 3 3



Looking forward for the insights.
Thanks in advance! 

 


AkshayS
Fluorite | Level 6

Thanks for responding, but i have follwed same steps but it is not helping as there is different behaviour of customer as shared in example.

Reeza
Super User

How is it not helping? Which cases didn't it consider? Please show your work?

 

AkshayS
Fluorite | Level 6

Hi there,

 

Please find the below approach taken to deal with all the types of customer behaviour in my data.

 

data abc;
input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10. ;
format transaction_date mmddyy10.;
datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;
run;


proc sql;
create table t1 as
select Customer_name,year(transaction_date) as year
from abc;
quit;


proc sql;
create table t2 as
select Customer_name,year
from t1;
quit;

 

data t3;
retain customer_name customer year lyear out;
set t2;
keep retain customer_name customer year lyear out;
customer=lag(customer_name);
lyear=lag(year);
out=(customer_name=customer) and (year ne (lyear+1)) ;


proc sql;
create table t4 as
select *,
case when out=0 and customer_name=customer then 1 else 0 end as sequence
from t3;
quit;

 

data t5;
set t4;
by Customer_name ;
retain sequence1 0 ;
if first.Customer_name then sequence1=1;
else sequence1=sequence;
run;


data t6;
set t5;
by Customer_name ;
retain sequence2 0 ;
if first.Customer_name then sequence2=sequence1;
else sequence2+sequence1;
run;


proc sql;
create table t7 as
select Customer_name,year,
case when out=1 and customer_name=customer and year>lyear+1
then 1 else sequence2 end as sequence3
from t6;
quit;

proc print;
run;

 

Output : 

customer_nameyearsequence3
Customer120011
Customer120022
Customer120033
Customer220011
Customer220022
Customer320011
Customer420031
Customer420051
Customer420071
Customer520011
Customer520022
Customer520051
Customer520063
Customer520074

 

The customer behaviour is that he is getting service for 2001,2002 and the he disappears for 2yrs and comes back in 2005 onwards. So the sequence creation is getting disturbed.

 

customer_nameyearsequence3
Customer520011
Customer520022
Customer520051
Customer520063
Customer520074

 

If you help me in fixing the sequence of such customer, my issue will be solved.

 

Thanks for your guidance. Please correct me if I am wrong at any step.

 

AkshayS
Fluorite | Level 6

Another approach, But facing same issue with customer type 5

 

 

data abc;
input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10. ;
format transaction_date mmddyy10.;
datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;
run;


proc sql;
create table t1 as
select Customer_name,year(transaction_date) as year
from abc;
quit;


proc sql;
create table t2 as
select Customer_name,year
from t1;
quit;

 

data t3;
retain customer_name customer year lyear out;
set t2;
keep retain customer_name customer year lyear out;
customer=lag(customer_name);
lyear=lag(year);
out=(customer_name=customer) and (year ne (lyear+1)) ;

 

data t4;
set t3;
by Customer_name ;
if first.Customer_name then SeqN=1;
else SeqN + 1;
drop Customer ;

 

proc sql;
create table t5 as
select Customer_name,year,SeqN,out,
case when out=0 then SeqN else out end as sequence
from t4;
quit;
proc print;
run;

Patrick
Opal | Level 21

Below one way that should work for you.

data have;
  input customer_id Customer_name $ 3 - 11 transaction_date :mmddyy10.;
  format transaction_date mmddyy10.;
  datalines;
1 Customer1 03/31/2001
1 Customer1 03/31/2002
1 Customer1 05/31/2002
1 Customer1 03/31/2003
2 Customer2 03/31/2001
2 Customer2 03/31/2002
3 Customer3 03/31/2001
4 Customer4 03/31/2003
4 Customer4 03/31/2005
4 Customer4 03/31/2007
5 Customer5 03/31/2001
5 Customer5 03/31/2002
5 Customer5 03/31/2005
5 Customer5 03/31/2006
5 Customer5 03/31/2007
;

data inter(keep=customer_id max_cnt);
  set have;
  by customer_id transaction_date;
  retain max_cnt;
  _lag_dt=lag(transaction_date);

  if first.customer_id or intck('year',_lag_dt,transaction_date) > 1 then 
    do;
      _cnt=1;
      max_cnt=1;
    end;
  else 
  if intck('year',_lag_dt,transaction_date) = 1 then
    do;
      _cnt+1;
      max_cnt=max(_cnt,max_cnt);
    end;

  if last.customer_id then output;
run;

data want;
  length seq_nr 8;
  merge have inter;
  by customer_id;
  if first.customer_id then seq_nr=1;
  else seq_nr+1;
run;

proc print data=want;
run;

Patrick_0-1593305300097.png

 

AkshayS
Fluorite | Level 6

Thank you so much Patrick! It is working for all different set of combinations, which I tried more for testing.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1446 views
  • 1 like
  • 3 in conversation