Hi All,
I have a transaction table that contains all the transactions (orders) made by ecah customer. I would like to calculate the difference between orders to detect when they start reducing their orders. I have run the code below, but it doesn't give what I need...For the first customer, the calculation is right but when I get to the second customer (the first row is wrong because it calculates the difference between the last date of the first customer and the first date of the second customer). How can I tell SAS, to stop at the last date of the first customer and start again at the first date of the first customer.
I have tried to use First function but it didon't work.. Your help wouldf be much appreciated. Many Thanks
DATA MODELLING;
SET MODELLING;
/*DIFFERENCE TIME BETWEEN ORDERS*/
BY SOLD_TO_ORG_ID DELIVERY_DATE;
/* IF FIRST.SOLD_TO_ORG_ID AND FIRST.DELIVERY_DATE THEN DO;*/
LAG_TIME = LAG(DELIVERY_DATE);
DIF_TIME = DELIVERY_DATE - LAG_TIME;
PER_TIME_INCREASE =(DIF_TIME/LAG_TIME)*100;
LAG_VALUE = LAG(TOTAL_SPEND);
DIF_VALUE = TOTAL_SPEND - LAG_VALUE;
PER_VALUE_INCREASE =(DIF_VALUE/LAG_VALUE)*100;
/* END;*/
/* OUTPUT;*/
RUN;
I don't understand the calculation of the increase/decrease, so I leave that to you. Taking the time-calculation: you divide the number of days between orders by the first date. That means that if they order exactly every 30 days you will see a decreasing result as the order date increases.
The calculation of the time between two purchases can be calculated as you do, but you have to make the calculated value missing every time you start with a new SOLD_TO_ORG_ID, like in the code below. By the way, you can use the DIF function to replace the combination of the LAG function and the subtraction.
data demo;
input SOLD_TO_ORG_ID DELIVERY_DATE : date. total_spend;
format delivery_date date.;
datalines;
1 12APR12 25
1 24apr12 71
1 13may12 321
1 04jun12 59
1 15jul12 77
2 19mar12 91
2 23may12 143
3 29mar12 86
3 6apr12 47
3 17apr12 66
3 16may12 91
3 21jun12 112
run;
data modeling;
set demo;
by SOLD_TO_ORG_ID DELIVERY_DATE ;
DIF_TIME = DIF(DELIVERY_DATE);
DIF_VALUE = dif(TOTAL_SPEND);
if first.sold_to_org_id then do;
dif_time=.;
dif_value=.;
end;
run;
ods listing;
options nocenter;
proc print;
run;
You are on the right track with the commented lines, using first. logic will solve it. But you need to use NOT first.
Hi LinusH,
Thanks for your reply...when you say I need to use NOT first, where exactly in the program?
IF FIRST.SOLD_TO_ORG_ID AND NOT FIRST.DELIVERY_DATE THEN DO;??
Up to you to decide - for which rows do you want to calculate differences.
If only on the same day, you need to NOT both.
Hi LinusH,
I would like to calculate the difference for every row per customer. For example if a customer has 10 transactions row, I would like to calculate the diff between 1st order and 2nd order, the diff between 2nd order and 3rd order, and difference between 3rd and 4th order so on...And when I get to the second customer, I would like to follow the same logic. Does it make sense?
Many thanks
Alice
Can't have any opinions on what you are calculating.
But as you describe it, you shouldn't test on date, just sold_to_org_id.
I don't understand the calculation of the increase/decrease, so I leave that to you. Taking the time-calculation: you divide the number of days between orders by the first date. That means that if they order exactly every 30 days you will see a decreasing result as the order date increases.
The calculation of the time between two purchases can be calculated as you do, but you have to make the calculated value missing every time you start with a new SOLD_TO_ORG_ID, like in the code below. By the way, you can use the DIF function to replace the combination of the LAG function and the subtraction.
data demo;
input SOLD_TO_ORG_ID DELIVERY_DATE : date. total_spend;
format delivery_date date.;
datalines;
1 12APR12 25
1 24apr12 71
1 13may12 321
1 04jun12 59
1 15jul12 77
2 19mar12 91
2 23may12 143
3 29mar12 86
3 6apr12 47
3 17apr12 66
3 16may12 91
3 21jun12 112
run;
data modeling;
set demo;
by SOLD_TO_ORG_ID DELIVERY_DATE ;
DIF_TIME = DIF(DELIVERY_DATE);
DIF_VALUE = dif(TOTAL_SPEND);
if first.sold_to_org_id then do;
dif_time=.;
dif_value=.;
end;
run;
ods listing;
options nocenter;
proc print;
run;
Hi EricT,
Many Thanks the code works perfectly...
Cheers
Alice
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.