## DATA Step, Macro, Functions and more

Solved
Frequent Contributor
Posts: 96

Hi All,

I have a transaction table that contains all the transactions (orders) made by ecah customer. I would like to calculate the difference between orders to detect when they start reducing their orders. I have run the code below, but it doesn't give what I need...For the first customer, the calculation is right but when I get to the second customer (the first row is wrong because it calculates the difference between the last date of the first customer and the first date of the second customer). How can I tell SAS, to stop at the last date of the first customer and start again at the first date of the first customer.

I have tried to use First function but it didon't work.. Your help wouldf be much appreciated. Many Thanks

DATA MODELLING;

SET MODELLING;

/*DIFFERENCE TIME BETWEEN ORDERS*/

BY SOLD_TO_ORG_ID DELIVERY_DATE;

/*  IF FIRST.SOLD_TO_ORG_ID AND FIRST.DELIVERY_DATE THEN DO;*/

LAG_TIME = LAG(DELIVERY_DATE);

DIF_TIME = DELIVERY_DATE - LAG_TIME;

PER_TIME_INCREASE =(DIF_TIME/LAG_TIME)*100;

LAG_VALUE = LAG(TOTAL_SPEND);

DIF_VALUE = TOTAL_SPEND - LAG_VALUE;

PER_VALUE_INCREASE =(DIF_VALUE/LAG_VALUE)*100;

/*  END;*/

/*  OUTPUT;*/

RUN;

Accepted Solutions
Solution
‎03-20-2013 04:19 PM
Contributor
Posts: 33

I don't understand the calculation of the increase/decrease, so I leave that to you. Taking the time-calculation: you divide the number of days between orders by the first date. That means that if they order exactly every 30 days you will see a decreasing result as the order date increases.

The calculation of the time between two purchases can be calculated as you do, but you have to make the calculated value missing every time you start with a new SOLD_TO_ORG_ID, like in the code below. By the way, you can use the DIF function to replace the combination of the LAG function and the subtraction.

data demo;

input SOLD_TO_ORG_ID DELIVERY_DATE : date. total_spend;

format delivery_date date.;

datalines;

1 12APR12  25

1 24apr12  71

1 13may12  321

1 04jun12  59

1 15jul12  77

2 19mar12  91

2 23may12  143

3 29mar12  86

3 6apr12   47

3 17apr12  66

3 16may12  91

3 21jun12  112

run;

data modeling;

set demo;

by SOLD_TO_ORG_ID DELIVERY_DATE ;

DIF_TIME = DIF(DELIVERY_DATE);

DIF_VALUE = dif(TOTAL_SPEND);

if first.sold_to_org_id then do;

dif_time=.;

dif_value=.;

end;

run;

ods listing;

options nocenter;

proc print;

run;

All Replies
Super User
Posts: 5,884

You are on the right track with the commented lines, using first. logic will solve it. But you need to use NOT first.

Data never sleeps
Frequent Contributor
Posts: 96

Hi LinusH,

Thanks for your reply...when you say I need to use NOT first, where exactly in the program?

IF FIRST.SOLD_TO_ORG_ID AND NOT FIRST.DELIVERY_DATE THEN DO;??

Super User
Posts: 5,884

Up to you to decide - for which rows do you want to calculate differences.

If only on the same day, you need to NOT both.

Data never sleeps
Frequent Contributor
Posts: 96

Hi LinusH,

I would like to calculate the difference for every row per customer. For example if a customer has 10 transactions row, I would like to calculate the diff between 1st order and 2nd order, the diff between 2nd order and 3rd order, and difference between 3rd and 4th order so on...And when I get to the second customer, I would like to follow the same logic. Does it make sense?

Many thanks

Alice

Super User
Posts: 5,884

Can't have any opinions on what you are calculating.

But as you describe it, you shouldn't test on date, just sold_to_org_id.

Data never sleeps
Solution
‎03-20-2013 04:19 PM
Contributor
Posts: 33

I don't understand the calculation of the increase/decrease, so I leave that to you. Taking the time-calculation: you divide the number of days between orders by the first date. That means that if they order exactly every 30 days you will see a decreasing result as the order date increases.

The calculation of the time between two purchases can be calculated as you do, but you have to make the calculated value missing every time you start with a new SOLD_TO_ORG_ID, like in the code below. By the way, you can use the DIF function to replace the combination of the LAG function and the subtraction.

data demo;

input SOLD_TO_ORG_ID DELIVERY_DATE : date. total_spend;

format delivery_date date.;

datalines;

1 12APR12  25

1 24apr12  71

1 13may12  321

1 04jun12  59

1 15jul12  77

2 19mar12  91

2 23may12  143

3 29mar12  86

3 6apr12   47

3 17apr12  66

3 16may12  91

3 21jun12  112

run;

data modeling;

set demo;

by SOLD_TO_ORG_ID DELIVERY_DATE ;

DIF_TIME = DIF(DELIVERY_DATE);

DIF_VALUE = dif(TOTAL_SPEND);

if first.sold_to_org_id then do;

dif_time=.;

dif_value=.;

end;

run;

ods listing;

options nocenter;

proc print;

run;

Frequent Contributor
Posts: 96

Hi EricT,

Many Thanks the code works perfectly...

Cheers

Alice

🔒 This topic is solved and locked.