Hi SAS experts,
I have a list of consumer id, date of first purchase, date of subsequent purchase (if any) and with all these, I want to find out when do people repurchase. Do people repurchase after X number of months?
Below is my code,
proc sql;
create table FUTURE_PURCHASES as
SELECT consumer_id,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,30,'e')
then 1 else 0 end as month_1,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,61,'e')
then 1 else 0 end as month_2,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,91,'e')
then 1 else 0 end as month_3,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,122,'e')
then 1 else 0 end as month_4,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,152,'e')
then 1 else 0 end as month_5,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,183,'e')
then 1 else 0 end as month_6,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,213,'e')
then 1 else 0 end as month_7,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,244,'e')
then 1 else 0 end as month_8,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,274,'e')
then 1 else 0 end as month_9,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,305,'e')
then 1 else 0 end as month_10,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,335,'e')
then 1 else 0 end as month_11,
case when sub_purchase_date between intnx('day',first_purchase_date,1,'e') and intnx('day',first_purchase_date,366,'e')
then 1 else 0 end as month_12
from TOTAL_SALES
order by consumer_id;
quit;
I know this may the the worst code you have ever seen but I am trying my best to make it shorter. Is there a way to shorten this code?
Thank you!
Do you have to use SQL? This is better as a datastep with an array.
I don't have to use SQL. I am fairly new to SAS and I have never used an array before. I will go check it out now!
I'd suggest using the intck function to find the number of months between sub_purchase_date and first_ purchase_date.
data want;
set have;
array month(12) month1-month12 (0 0 0 0 0 0 0 0 0 0 0 0);
diff=intck('month', first_purchase_date, sub_purchase_date);
month(diff)=1;
run;
Tom's right, you should use 30 day intervals instead of 'months'.
So that's how an array function looks like, I will try it out now!
May take awhile for me to figure it out but i'll use your code as reference.
Thanks Reeza and Tom!
What are you actually trying to do?
Do you really want all of those boolean flag variables or is this just a intermediate step on the way to creating a frequency by month table?
Looks like you are trying to mimic dividing the data into months. Do you want to generate actual months? Since these derived intervals are not actual calendar months then you will probably get better results by using a consistent 30 day interval instead.
This is just an intermediate step. Eventually I will sum everything by month and then find the repurchase rate/trend. I am supposed to compare repurchase trend for members and non-members.
You are right, I should change it to a 30 day interval since it's not following the actual calendar.
Take the difference in days and divide by 30.
Then summarize be the new categorical variable.
data intermediate ;
set TOTAL_SALES ;
month = int( sub_purchase_date - first_purchase_date / 30 ) ;
run;
proc freq ;
tables month;
run;
data have;
infile cards truncover ;
input id first_purchase :mmddyy10. (subpurchase_1-subpurchase_12)(:mmddyy10.);
format first_purchase--subpurchase_12 mmddyys10.;
cards;
1 01/12/2012 01/13/2012 02/23/2012 04/05/2012 05/28/2012 06/02/2012 08/05/2013 12/23/2012 01/03/2013 . . . .
2 01/27/2012 03/04/2012 03/12/2012 04/12/2012 12/15/2012 . . . . . . . . .
;
run;
data want;
set have;
array date subpurchase_:;
array flag flag1-flag12;
do i=1 to dim(date);
if not missing(date(i)) then do;
flag(i)=ifn((date(i)-intnx('day',first_purchase,1, 'e')<=i*30),1,0);
end;
end;
drop i;
run;
data want;
set have;
month_1= (first_purchase_date <=sub_purchase_date <= first_purchase_date+30 ) ;
month_2= (first_purchase_date+31 <=sub_purchase_date <= first_purchase_date+61 ) ;
...........
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.