Hi there.
I'm a little puzzled on how to do the following and am hoping the community could possible help.
I have a data set that holds 2 months worth of data.
We have 3 variables
CustomerID Date Amount
What im trying to do is bring back all the observations where the amount in any 2 day window in the two months exceeds 500
so a customer might have the following
CustomerID Date Amount
1 01/01/2017 300
1 02/01/2017 300
1 03/01/2017 400
1 07/01/2017 200
I would like to see the following output
CustomerID Amount
1 600
1 700
so it would total the amounts from the 1st and 2nd then the 2nd and 3rd but not the 7th as its not in the 2 day window.
Hope this makes sense.
Many Thanks
Stret
I'd use a slight variation of @gamotte's code to deal with multiple customers:
data want;
set have;
by CustomerId;
keep CustomerId totalAmount;
lAmount = lag(Amount);
lDate = lag(Date);
totalAmount = sum(Amount,lAmount);
if not first.CustomerId and totalAmount >= 500 and Date = intnx('day',lDate,1);
run;
Dataset have needs to be sorted by CustomerId and Date.
Hello,
data want;
set have;
keep CustomerId totalAmount;
lAmount=lag(Amount);
lDate=lag(Date);
if _N_>1 then do;
totalAmount=Amount+lAmount;
end;
if totalAmount>=500 and Date=intnx('day',lDate,1);
run;
I'd use a slight variation of @gamotte's code to deal with multiple customers:
data want;
set have;
by CustomerId;
keep CustomerId totalAmount;
lAmount = lag(Amount);
lDate = lag(Date);
totalAmount = sum(Amount,lAmount);
if not first.CustomerId and totalAmount >= 500 and Date = intnx('day',lDate,1);
run;
Dataset have needs to be sorted by CustomerId and Date.
This is a good problem to use both LAG and its related DIF function:
data want;
set have;
by id;
twoday_total=sum(amount,lag(amount));
if dif(date)=1 and first.id=0 and twoday_total>500;
run;
Modify to accomodate multiple records on the same date:
@mkeintz wrote:
This is a good problem to use both LAG and its related DIF function:
data want;
set have;
by id;
twoday_total=sum(amount,lag(amount));
if dif(date)=1 and first.id=0 and twoday_total>500;
run;
I don't see a reason to make an intermediate dataset with daily totals to solve this problem. It can be accomodated in the same data step:
data want;
set have;
by id date;
retain first_date .;
if first.id then first_date=date;
dailysum+amount;
if first.date then dailysum=amount;
if last.date then do;
twoday_total=sum(dailysum,lag(dailysum);
if dif(date)=1 and date^=first_date and twoday_total>500;
end;
/* editted changes */
if last.date;
twoday_total=sum(dailysum,lag(dailysum);
if dif(date)=1 and date^=first_date and twoday_total>500;
run;
You already have better answers there, here is just to show you an unorthodox way of doing it, and there are times you could leverag this kind of programming:
data have;
input CustomerID Date:ddmmyy10. Amount;
cards;
1 01/01/2017 300
1 02/01/2017 300
1 03/01/2017 400
1 07/01/2017 200
;
proc sql;
select t1.customerid, sum(t1.amount,t2.amount) as amount
from have t1, have t2
where t1.customerid=t2.customerid
and t1.date-t2.date=1
and calculated amount >500
;
quit;
Hi Again.
Both the examples above (not the sql) seem to be working the same way however it is missing some.
So i have a customer with the following
27/01/2017 £73.18
27/01/2017 £87.00
28/01/2017 £900.00
and the code is creating a total column with £987 which is fine as its over £500, but it isnt including the £73.18 in the total.
On the same customer i have the following
27/02/2017 £73.18
27/02/2017 £87.00
28/02/2017 £137.00
28/02/2017 £29.00
28/02/2017 £12.00
28/02/2017 £900.00
and it hasnt included that in the output?
I'm not sure the reason for this.
Help continue to be appreciated
That's because we all were expecting only one record per customer and date.
So you should add an intermediate step that sums up per customer and date. Otherwise the lagging becomes unwieldy.
PS you just learned your first lesson in creating example data 🙂
"Cover all bases", ie have all possible test cases in it.
@Kurt_Bremser wrote:
PS you just learned your first lesson in creating example data 🙂
"Cover all bases", ie have all possible test cases in it.
Noted .
Thanks very much for the help......
data have;
input CustomerID Date:ddmmyy10. Amount;
format date ddmmyy10.;
cards;
1 01/01/2017 300
1 02/01/2017 300
1 03/01/2017 400
1 07/01/2017 200
;
proc sql;
select *,(select sum(amount) from have where customerid=a.customerid and
date between a.date-1 and a.date) as sum_amount
from have as a
where calculated sum_amount >500
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.