BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RobertNYC
Obsidian | Level 7

Hi all,

 

The mostly Base SAS script below calculates the average time between purchases for a customer based on transaction dates in one variable.  What I would like very much to do is rewrite this using just Proc SQL and to do this all in just one pass.   Here is the full code plus I have broken the script out step by step below.  

 

proc sort data=have ;
by mstr_customer_id transaction_date order_id ;run;
proc sort data=have  out=purchase (keep=mstr_customer_id order_id transaction_date ) nodupkey;
where transaction_sub_type=1 ;
by mstr_customer_id order_id;
 ;run;
data single multiple;
set purchase;
by mstr_customer_id;
if first.mstr_customer_id=1 and last.mstr_customer_id=1 then output single;
else output multiple;run;
proc sort data=multiple; 
by mstr_customer_id transaction_date;run;
data multiple1;
format first_multiple_td_p last_multiple_td_p ;
set  multiple;
format l_transaction_date date8.;
retain first_multiple_td_p;
by mstr_customer_id transaction_date;
l_transaction_date = lag(transaction_date);
if first.mstr_customer_id then do;
l_transaction_date = .;
date_diff=0;
end;
if not first.mstr_customer_id then do;
date_diff = transaction_date-l_transaction_date;
end;
if first.mstr_customer_id then first_multiple_td_p = transaction_date;
if last.mstr_customer_id then last_multiple_td_p = transaction_date;
if last.mstr_customer_id then first_last_diff_p = last_multiple_td_p-first_multiple_td_p;run;
proc sql;
create table avg_bw_purchase as
select distinct mstr_customer_id,
transaction_date,
date_diff,
first_last_diff_p as first_last_diff_pur,
avg(date_diff) as avg_time_bw_pur 
from multiple1
group by mstr_customer_id;quit;
proc sort data=avg_bw_purchase;
by mstr_customer_id transaction_date;run;
data avg_bw_purchase(drop=date_diff transaction_date);
set avg_bw_purchase;
by mstr_customer_id transaction_date;
if  first_last_diff_pur>=0 then output;run;

Again, I need to rewrite the above his using just Proc SQL and to do this all in just one pass. Thanks!! 

 

The data I start with looks like this ....

 

MSTR_CUSTOMER_ID TRANSACTION_DATE ORDER_ID TRANSACTION_SUB_TYPE
46862 12-Aug-17 00500009201708120000257749 1
46862 12-Aug-17 00500009201708120000257749 1
46862 25-Nov-17 00400027201711250000402816 1
46862 25-Nov-17 00400027201711250000402816 1
46862 25-Nov-17 00400027201711250000402816 1
46862 25-Nov-17 00400027201711250000402816 1
46862 25-Nov-17 00400027201711250000402816 1
54691 10-Jul-17 00100140201707100000271298 1
54691 10-Jul-17 00100140201707100000271298 1
54691 10-Jul-17 00100140201707100000271298 1
54691 10-Jul-17 00100140201707100000271298 1
54691 10-Jul-17 00100140201707100000271299 1
54691 10-Jul-17 00100140201707100000271299 1
54691 10-Jul-17 00100140201707100000271299 1
54691 22-Jul-17 00400140201707220000312323 1
54691 22-Jul-17 00400140201707220000312323 1
54691 22-Jul-17 00700140201707220000397098 1
54691 26-Nov-17 00300140201711260000225399 1
54691 26-Nov-17 00300140201711260000225399 1
54691 26-Nov-17 00300140201711260000225399 1
54691 26-Nov-17 00300140201711260000225399 1
54691 26-Nov-17 00300140201711260000225399 1
89553 7-May-17 00200013201705070000243369 1
92118 26-May-17 00300273201705260000126004 1
143541 28-May-17 00200091201705280000399615 1
143541 28-May-17 00200091201705280000399615 1
143541 12-Jun-17 00100091201706120000361014 1
143541 14-Jun-17 00300091201706140000338020 1
144239 10-Sep-17 00400202201709100000007705 1
144239 10-Sep-17 00400202201709100000007705 1
162339 19-Sep-17 00600092201709190000442273 1

 

Step 1 

 

proc sort data=have;
by mstr_customer_id transaction_date order_id ;run;

proc sort data=have out=purchase (keep=mstr_customer_id order_id transaction_date ) nodupkey;
where transaction_sub_type=1 ;
by mstr_customer_id order_id;
;run;

 

 

TRANSACTION_DATE MSTR_CUSTOMER_ID ORDER_ID
12-Aug-17 46862 00500009201708120000257749
25-Nov-17 46862 00400027201711250000402816
10-Jul-17 54691 00100140201707100000271298
10-Jul-17 54691 00100140201707100000271299
22-Jul-17 54691 00400140201707220000312323
22-Jul-17 54691 00700140201707220000397098
26-Nov-17 54691 00300140201711260000225399
10-Dec-17 54691 00900140201712100000334607
28-May-17 143541 00200091201705280000399615
12-Jun-17 143541 00100091201706120000361014
14-Jun-17 143541 00300091201706140000338020

 

Step #2 : separate multiple purchasers from single purchasers

 

 

data single multiple;
set purchase;
by mstr_customer_id;
if first.mstr_customer_id=1 and last.mstr_customer_id=1 then output single;
else output multiple;run;
proc sort data=multiple; 
by mstr_customer_id transaction_date;run;

 

 

TRANSACTION_DATE MSTR_CUSTOMER_ID ORDER_ID
12-Aug-17 46862 00500009201708120000257749
25-Nov-17 46862 00400027201711250000402816
10-Jul-17 54691 00100140201707100000271298
10-Jul-17 54691 00100140201707100000271299
22-Jul-17 54691 00400140201707220000312323
22-Jul-17 54691 00700140201707220000397098
26-Nov-17 54691 00300140201711260000225399
10-Dec-17 54691 00900140201712100000334607
28-May-17 143541 00200091201705280000399615
12-Jun-17 143541 00100091201706120000361014
14-Jun-17 143541 00300091201706140000338020

 

 

Step #3 :Calculate date_diff & first_last_diff_p

 

 

data multiple1;
format first_multiple_td_p last_multiple_td_p ;
set  multiple;
format l_transaction_date date8.;
retain first_multiple_td_p;
by mstr_customer_id transaction_date;
l_transaction_date = lag(transaction_date);
if first.mstr_customer_id then do;
l_transaction_date = .;
date_diff=0;
end;
if not first.mstr_customer_id then do;
date_diff = transaction_date-l_transaction_date;
end;
if first.mstr_customer_id then first_multiple_td_p = transaction_date;
if last.mstr_customer_id then last_multiple_td_p = transaction_date;
if last.mstr_customer_id then first_last_diff_p = last_multiple_td_p-first_multiple_td_p;run;

 

 

first_multiple_td_p last_multiple_td_p TRANSACTION_DATE MSTR_CUSTOMER_ID ORDER_ID l_transaction_date date_diff first_last_diff_p
11-Aug-57   12-Aug-17 46862 00500009201708120000257749   0  
11-Aug-57 24-Nov-57 25-Nov-17 46862 00400027201711250000402816 12-Aug-17 105 105
9-Jul-57   10-Jul-17 54691 00100140201707100000271298   0  
9-Jul-57   10-Jul-17 54691 00100140201707100000271299 10-Jul-17 0  
9-Jul-57   22-Jul-17 54691 00400140201707220000312323 10-Jul-17 12  
9-Jul-57   22-Jul-17 54691 00700140201707220000397098 22-Jul-17 0  
9-Jul-57   26-Nov-17 54691 00300140201711260000225399 22-Jul-17 127  
9-Jul-57 9-Dec-57 10-Dec-17 54691 00900140201712100000334607 26-Nov-17 14 153
27-May-57   28-May-17 143541 00200091201705280000399615   0  
27-May-57   12-Jun-17 143541 00100091201706120000361014 28-May-17 15  
27-May-57 13-Jun-57 14-Jun-17 143541 00300091201706140000338020 12-Jun-17 2 17

 

 

Step #4 :Aggregate

 

 

proc sql;
create table avg_bw_purchase as
select distinct mstr_customer_id,
transaction_date,
date_diff,
first_last_diff_p as first_last_diff_pur,
avg(date_diff) as avg_time_bw_pur 
from multiple1
group by mstr_customer_id;quit;
MSTR_CUSTOMER_ID TRANSACTION_DATE date_diff first_last_diff_pur avg_time_bw_pur
46862 12Aug2017 0   52.5
46862 25Nov2017 105 105 52.5
54691 10Jul2017 0   25.5
54691 22Jul2017 0   25.5
54691 22Jul2017 12   25.5
54691 26Nov2017 127   25.5
54691 10Dec2017 14 153 25.5
143541 28May2017 0   5.666666667
143541 12Jun2017 15   5.666666667
143541 14Jun2017 2 17 5.666666667

 

Step #5 :Final  - one row for each customer 

 

proc sort data=avg_bw_purchase;
by mstr_customer_id transaction_date;run;
data avg_bw_purchase(drop=date_diff transaction_date);
set avg_bw_purchase;
by mstr_customer_id transaction_date;
if  first_last_diff_pur>=0 then output;run;

 

MSTR_CUSTOMER_ID first_last_diff_pur avg_time_bw_pur
46862 105 52.5
54691 153 25.5
143541 17 5.666667

 

 

Again, any help rewriting this just using Proc SQL and in one pass will be greatly appreciated. Thanks! 

1 ACCEPTED SOLUTION

Accepted Solutions
noling
SAS Employee

I see. You're not interested in items per order, but the number of days between orders (which may contain 1 or 100 items):

 

proc sql noprint;
	create table want as
	select distinct a.MSTR_CUSTOMER_ID, 
		   max(TRANSACTION_DATE) - min(TRANSACTION_DATE) as days, 
		   b.orders_per_ID,
		   (max(TRANSACTION_DATE) - min(TRANSACTION_DATE))/b.orders_per_ID as average
	from have as a, (select  MSTR_CUSTOMER_ID, 
			    count (distinct order_id) as orders_per_id
			 from have
		         group by MSTR_CUSTOMER_ID) as b
	where a.MSTR_CUSTOMER_ID = b.MSTR_CUSTOMER_ID

	group by a.MSTR_CUSTOMER_ID
	having orders_per_id > 1
;
quit;

Also, it looks like some of your counts are off. I see that ID 143541 has 3 distinct order ID's, while you have 4. Also you have 153 days for the difference for 54691, while there are only 139 days between  26NOV2017 and 10JUL2017.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

View solution in original post

4 REPLIES 4
noling
SAS Employee

This what you need? I'm dividing between the number of orders, but you may want to do number of orders -1. 

 

If you have 2 orders 10 days apart do you expect the difference to be 10 days or 10 days/2 orders= 5 days?  You can change "count(*)" to "max((count(*) -1, 1)" if desired (not tested). Use the max function so that if there is only 1 order, you don't divide by 0.

 

data have;
	length MSTR_CUSTOMER_ID 8.	TRANSACTION_DATE 8. ORDER_ID $50. TRANSACTION_SUB_TYPE 8.;
	input MSTR_CUSTOMER_ID TRANSACTION_DATE date11. ORDER_ID TRANSACTION_SUB_TYPE;
	format transaction_date date11.;
datalines;
46862 12-Aug-17 00500009201708120000257749 1
46862 12-Aug-17 00500009201708120000257749 1
46862 25-Nov-17 00400027201711250000402816 1
46862 25-Nov-17 00400027201711250000402816 1
46862 25-Nov-17 00400027201711250000402816 1
46862 25-Nov-17 00400027201711250000402816 1
46862 25-Nov-17 00400027201711250000402816 1
54691 10-Jul-17 00100140201707100000271298 1
54691 10-Jul-17 00100140201707100000271298 1
54691 10-Jul-17 00100140201707100000271298 1
54691 10-Jul-17 00100140201707100000271298 1
54691 10-Jul-17 00100140201707100000271299 1
54691 10-Jul-17 00100140201707100000271299 1
54691 10-Jul-17 00100140201707100000271299 1
54691 22-Jul-17 00400140201707220000312323 1
54691 22-Jul-17 00400140201707220000312323 1
54691 22-Jul-17 00700140201707220000397098 1
54691 26-Nov-17 00300140201711260000225399 1
54691 26-Nov-17 00300140201711260000225399 1
54691 26-Nov-17 00300140201711260000225399 1
54691 26-Nov-17 00300140201711260000225399 1
54691 26-Nov-17 00300140201711260000225399 1
89553 7-May-17  00200013201705070000243369 1
92118 26-May-17 00300273201705260000126004 1
143541 28-May-17 0020009120170528000039961 51
143541 28-May-17 0020009120170528000039961 51
143541 12-Jun-17 0010009120170612000036101 41
143541 14-Jun-17 0030009120170614000033802 01
144239 10-Sep-17 0040020220170910000000770 51
144239 10-Sep-17 0040020220170910000000770 51
162339 19-Sep-17 0060009220170919000044227 31
;;;;
run;

proc sql noprint;
	create table want as
	select MSTR_CUSTOMER_ID, max(TRANSACTION_DATE) - min(TRANSACTION_DATE) as days, count (*) as orders, 
		(max(TRANSACTION_DATE) - min(TRANSACTION_DATE))/count(*) as average
	from have
	group by MSTR_CUSTOMER_ID;
quit;

  


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

RobertNYC
Obsidian | Level 7

 

Thanks noling.  It looks like that's taking the average time between order id and not transaction date 

 

MSTR_CUSTOMER_ID days orders average
46862 105 7 15
54691 153 24 6.375
89553 0 1 0
92118 0 1 0
143541 17 4 4.25
144239 0 2 0
162339 0 1 0

 

All customer with just one purchase are excluded.

 

 So if I customer has more than one purchase I'm looking for average time between each distinct  transaction date.  

 

MSTR_CUSTOMER_ID first_last_diff_pur avg_time_bw_pur
46862 105 52.5
54691 153 25.5
143541 17 5.666667
RobertNYC
Obsidian | Level 7

 

Thanks noling. 

 

It looks like that's taking the average time between order id and not transaction date 

 

MSTR_CUSTOMER_ID days orders average
46862 105 7 15
54691 153 24 6.375
89553 0 1 0
92118 0 1 0
143541 17 4 4.25
144239 0 2 0
162339 0 1 0

 

All customer with just one purchase are excluded.

 

 So if I customer has more than one purchase I'm looking for average time between each distinct  transaction date.  

 

MSTR_CUSTOMER_ID first_last_diff_pur avg_time_bw_pur
46862 105 52.5
54691 153 25.5
143541 17 5.666667
noling
SAS Employee

I see. You're not interested in items per order, but the number of days between orders (which may contain 1 or 100 items):

 

proc sql noprint;
	create table want as
	select distinct a.MSTR_CUSTOMER_ID, 
		   max(TRANSACTION_DATE) - min(TRANSACTION_DATE) as days, 
		   b.orders_per_ID,
		   (max(TRANSACTION_DATE) - min(TRANSACTION_DATE))/b.orders_per_ID as average
	from have as a, (select  MSTR_CUSTOMER_ID, 
			    count (distinct order_id) as orders_per_id
			 from have
		         group by MSTR_CUSTOMER_ID) as b
	where a.MSTR_CUSTOMER_ID = b.MSTR_CUSTOMER_ID

	group by a.MSTR_CUSTOMER_ID
	having orders_per_id > 1
;
quit;

Also, it looks like some of your counts are off. I see that ID 143541 has 3 distinct order ID's, while you have 4. Also you have 153 days for the difference for 54691, while there are only 139 days between  26NOV2017 and 10JUL2017.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2382 views
  • 1 like
  • 2 in conversation