BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Stretlow
Obsidian | Level 7

 

Hi there.

 

I'm a little puzzled on how to do the following and am hoping the community could possible help.

 

I have a data set that holds 2 months worth of data.

 

We have 3 variables

 

CustomerID  Date  Amount

 

What im trying to do is bring back all the observations where the amount in any 2 day window in the two months exceeds 500

 

so a customer might have the following 

CustomerID  Date              Amount

1                  01/01/2017    300

1                  02/01/2017    300

1                  03/01/2017    400

1                  07/01/2017    200

 

I would like to see the following output

 

CustomerID Amount

1                 600

1                 700

 

so it would total the amounts from the 1st and 2nd then the 2nd and 3rd but not the 7th as its not in the 2 day window.

 

 

Hope  this makes sense.

 

Many Thanks

 

Stret

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I'd use a slight variation of @gamotte's code to deal with multiple customers:

data want;
set have;
by CustomerId;
keep CustomerId totalAmount;
lAmount = lag(Amount);
lDate = lag(Date);
totalAmount = sum(Amount,lAmount);
if not first.CustomerId and totalAmount >= 500 and Date = intnx('day',lDate,1);
run;

Dataset have needs to be sorted by CustomerId and Date.

View solution in original post

11 REPLIES 11
gamotte
Rhodochrosite | Level 12

Hello,

 

data want;
	set have;

	keep CustomerId totalAmount;

	lAmount=lag(Amount);
	lDate=lag(Date);

	if _N_>1 then do;
		totalAmount=Amount+lAmount;
	end;

	if totalAmount>=500 and Date=intnx('day',lDate,1);
run;
Kurt_Bremser
Super User

I'd use a slight variation of @gamotte's code to deal with multiple customers:

data want;
set have;
by CustomerId;
keep CustomerId totalAmount;
lAmount = lag(Amount);
lDate = lag(Date);
totalAmount = sum(Amount,lAmount);
if not first.CustomerId and totalAmount >= 500 and Date = intnx('day',lDate,1);
run;

Dataset have needs to be sorted by CustomerId and Date.

mkeintz
PROC Star

This is a good  problem to use both LAG and its related DIF function:

 

data want; 

   set have;

   by id;

 

   twoday_total=sum(amount,lag(amount));

   if dif(date)=1 and first.id=0 and twoday_total>500;

run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Modify to accomodate multiple records on the same date:


@mkeintz wrote:

This is a good  problem to use both LAG and its related DIF function:

 

data want; 

   set have;

   by id;

 

   twoday_total=sum(amount,lag(amount));

   if dif(date)=1 and first.id=0 and twoday_total>500;

run;

 

 


 

I don't see a reason to make an intermediate dataset with daily totals to solve this problem.  It can be accomodated in the same data step:

 

data want;

  set have;

  by id date;

 

  retain first_date .;

  if first.id then first_date=date;

 

  dailysum+amount;

  if first.date then dailysum=amount;

 

  if last.date then do;

    twoday_total=sum(dailysum,lag(dailysum);

    if dif(date)=1 and date^=first_date and twoday_total>500;

  end;

 

/* editted changes */

  if last.date;

  twoday_total=sum(dailysum,lag(dailysum);

  if dif(date)=1 and date^=first_date and twoday_total>500;

 

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Haikuo
Onyx | Level 15

You already have better answers there, here is just to show you an unorthodox way of doing it, and there are times you could leverag this kind of programming:

 

data have;
input CustomerID  Date:ddmmyy10.              Amount;
cards;
1                  01/01/2017    300
1                  02/01/2017    300
1                  03/01/2017    400
1                  07/01/2017    200
;

proc sql;
select t1.customerid, sum(t1.amount,t2.amount) as amount
from have t1, have t2
where t1.customerid=t2.customerid
and t1.date-t2.date=1
and calculated amount >500
;
quit;
Stretlow
Obsidian | Level 7
Thanks for the responses, i'm testing them now... will report back
Stretlow
Obsidian | Level 7

Hi Again.

 

Both the examples above (not the sql) seem to be working the same way however it is missing some.

 

So i have a customer with the following

 

27/01/2017 £73.18

27/01/2017 £87.00

28/01/2017 £900.00

 

and the code is creating a total column with £987 which is fine as its over £500, but it isnt including the £73.18 in the total.

 

On the same customer i have the following

 

27/02/2017 £73.18

27/02/2017 £87.00

28/02/2017 £137.00

28/02/2017 £29.00

28/02/2017 £12.00

28/02/2017 £900.00

 

and it hasnt included that in the output?

 

I'm not sure the reason for this.

 

Help continue to be appreciated

Kurt_Bremser
Super User

That's because we all were expecting only one record per customer and date.

So you should add an intermediate step that sums up per customer and date. Otherwise the lagging becomes unwieldy.

Stretlow
Obsidian | Level 7

@Kurt_Bremser wrote:

PS you just learned your first lesson in creating example data 🙂

"Cover all bases", ie have all possible test cases in it.


 

Noted Smiley Happy.

 

Thanks very much for the help......

Ksharp
Super User
data have;
input CustomerID  Date:ddmmyy10.              Amount;
format date ddmmyy10.;
cards;
1                  01/01/2017    300
1                  02/01/2017    300
1                  03/01/2017    400
1                  07/01/2017    200
;

proc sql;
select *,(select sum(amount) from have where customerid=a.customerid and
date between a.date-1 and a.date) as sum_amount
from have as a
where calculated sum_amount >500
;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1150 views
  • 4 likes
  • 6 in conversation