Contributor
Posts: 58

# Count of observations by id and date if there are 4 or more transactions in a 2 day window

Hi there, sorry second question of the day today.

I have a large data set with numerous variables two of which are "ID" and "TransactionDate"

what I want to do is count the number of transactions in two day windows per id

Example.

ID        TransactionDate

1          01/11/2016

1          01/11/2016

1          02/11/2016

1          02/11/2016

2          01/11/2016

2          07/11/2016

3          03/11/2016

3          04/11/2016

3          04/11/2016

3          04/11/2016

I only want to see ID 1 and 3 as they have 4 records in a two day window.

I've had a play around with proq freq and the dif and Lag functions but I just cant seem to work it out.

Marc

Super User
Posts: 9,227

## Re: Count of observations by id and date if there are 4 or more transactions in a 2 day window

Well, step 1 is to identify records which conform to your "2 day window" criteria.  You would need to flesh this out for me a bit, is it 2 days from earliest date, 2 days from previous etc.

Wonce you have that then a simple count() group by id <var>; would work.

Super User
Posts: 9,611

## Re: Count of observations by id and date if there are 4 or more transactions in a 2 day window

``````data have;
input id transactiondate :ddmmyy10.;
format transactiondate ddmmyy10.;
cards;
1          01/11/2016
1          01/11/2016
1          02/11/2016
1          02/11/2016
2          01/11/2016
2          07/11/2016
3          03/11/2016
3          04/11/2016
3          04/11/2016
3          04/11/2016
;
run;

/* First accumulate per day */
proc sql;
create table int as
select id, transactiondate, count(*) as count
from have
group by id,transactiondate
;
quit;

/* Now compare with previous observation */
data want;
set int;
by id;
lastdate = lag(transactiondate);
format lastdate ddmmyy10.;
lastcount = lag(count);
if lastdate = transactiondate - 1
then do;
count = count + lastcount;
output;
end;
keep id transactiondate count lastdate;
run;``````
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 58

## Re: Count of observations by id and date if there are 4 or more transactions in a 2 day window

KurtBremser wrote:
``````data have;
input id transactiondate :ddmmyy10.;
format transactiondate ddmmyy10.;
cards;
1          01/11/2016
1          01/11/2016
1          02/11/2016
1          02/11/2016
2          01/11/2016
2          07/11/2016
3          03/11/2016
3          04/11/2016
3          04/11/2016
3          04/11/2016
;
run;

/* First accumulate per day */
proc sql;
create table int as
select id, transactiondate, count(*) as count
from have
group by id,transactiondate
;
quit;

/* Now compare with previous observation */
data want;
set int;
by id;
lastdate = lag(transactiondate);
format lastdate ddmmyy10.;
lastcount = lag(count);
if lastdate = transactiondate - 1
then do;
count = count + lastcount;
output;
end;
keep id transactiondate count lastdate;
run;``````

Hi.

Thanks for this. the second part of the code there is what i was trying to do but i was a little out.

It looks to be working in the main however I have found an example where there are 4 observations on one day that are not in the last output dataset.

Do you know why this would be?

Thanks

Stret

Super User
Posts: 9,611

## Re: Count of observations by id and date if there are 4 or more transactions in a 2 day window

I see. I am specifically checking for two days with data.

``````data want;
set int;
by id;
lastdate = lag(transactiondate);
format lastdate ddmmyy10.;
lastcount = lag(count);
if lastdate = transactiondate - 1
then do;
count = count + lastcount;
if count >= 4 then output;
end;
else do;
if count >= 4 then output;
end;
keep id transactiondate count lastdate;
run;``````

You might get redundant entries if you have a sequence like 3-3-3 on three successive days.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Posts: 1,288

## Re: Count of observations by id and date if there are 4 or more transactions in a 2 day window

If your data are sorted by id/transaction date, then this will be efficient and simple:

notes:

1. The do loop contructs a count of transactions (_NTRANS) for a single day
2. Then a two-record count is calculated
3. Only if the prior record was one day preceding the current record, keep it
4. The first program, however, will output overlapping two-day windows (i.e. 3 consecutive dates).  If you don't  want that, use the second program
``````data want (drop=_:);
do _ntrans=1 by 1 until (last.transactiondate);
set have;
by id transactiondate ;
end;

twoday_count=sum(_ntrans,lag(_ntrans));
if dif(transactiondate)=1;
run;
``````

``````data want2 (drop=_:);
retain _last_windowdate;  format _last_windowdate yymmddn8.;
do _ntrans=1 by 1 until (last.transactiondate);
set have;
by id transactiondate ;
if first.id then _last_windowdate=.;
end;
twoday_count=sum(_ntrans,lag(_ntrans));
if dif(transactiondate)=1  and (_last_windowdate^=transactiondate-2);
_last_windowdate=transactiondate;
run;
``````

Contributor
Posts: 58

## Re: Count of observations by id and date if there are 4 or more transactions in a 2 day window

mkeintz wrote:

If your data are sorted by id/transaction date, then this will be efficient and simple:

notes:

1. The do loop contructs a count of transactions (_NTRANS) for a single day
2. Then a two-record count is calculated
3. Only if the prior record was one day preceding the current record, keep it
4. The first program, however, will output overlapping two-day windows (i.e. 3 consecutive dates).  If you don't  want that, use the second program
``````data want (drop=_:);
do _ntrans=1 by 1 until (last.transactiondate);
set have;
by id transactiondate ;
end;

twoday_count=sum(_ntrans,lag(_ntrans));
if dif(transactiondate)=1;
run;
``````

Hi.

This again seems to be working fine other than the anomoly i posted above.

I have one particular ID that has 8 Observations.

4 on the 18/11/2016

2 on the 29/11/2016

2 on the 30/11/2016

Its output the 29th and 30th however not the 4 on the 18th.

Thanks for the help

Posts: 1,288

## Re: Count of observations by id and date if there are 4 or more transactions in a 2 day window

Oops sorry.  I see you mean 4 trans  WITHIN two days, even if only one of those days is present:

BTW, notice the "trick" here is to update the LAG/DIF queues only at the end of each day's set of transactions.  Also, even though the "lag(_ntrans)" is inside an IFN function, the lag queue is updated regardless of whether the ifn test is true.  Which is what you want - the lag queue is always updated even if its results are not returned by the IFN function.

data want;

do _ntrans=1 by 1 until(last.transactiondate);

set have;

by id transactiondate;

end;

total_count = sum(_ntrans,ifn(dif(transactiondate)=1 and dif(id)=0,lag(_ntrans),0));

if total_count>=4;

run;

Contributor
Posts: 58

## Re: Count of observations by id and date if there are 4 or more transactions in a 2 day window

mkeintz wrote:

Oops sorry.  I see you mean 4 trans  WITHIN two days, even if only one of those days is present:

BTW, notice the "trick" here is to update the LAG/DIF queues only at the end of each day's set of transactions.  Also, even though the "lag(_ntrans)" is inside an IFN function, the lag queue is updated regardless of whether the ifn test is true.  Which is what you want - the lag queue is always updated even if its results are not returned by the IFN function.

data want;

do _ntrans=1 by 1 until(last.transactiondate);

set have;

by id transactiondate;

end;

total_count = sum(_ntrans,ifn(dif(transactiondate)=1 and dif(id)=0,lag(_ntrans),0));

if total_count>=4;

run;

This is making a bit more sense now and I think we're almost there with it.

heres a sample of the output I have:-

 ID TransactionDate total_count 1510110787 18/11/2016 4 1510110787 30/11/2016 4 1510141720 26/11/2016 4 1510390852 21/11/2016 5 1510390852 22/11/2016 7 1510390852 28/11/2016 4 1510472816 20/11/2016 4

I think you touched on this earler in the thread about overlapping days.

For ID 1510390852 there are 5 observations on the 21/11/2016 and 2 on the 22/11/2016.

Instead of the output being 5 and 7 for the 21st and 22nd, as these are within a two day window would just like to see 1 observation with a value of 7.

Is this possible?

Obviously for the first Id in that table as the 18th and the 30th arent within a two day window the 2 obervations with a value of 4 are correct.

Stret

Posts: 1,288

## Re: Count of observations by id and date if there are 4 or more transactions in a 2 day window

[ Edited ]

You want

For any pair of consecutive days output total transactions if the total>=4.  This just requires looking back one record.

For any single day not part of such a pair output total transactions if the total>=4.  this will require looking ahead one record to see if the next record is a consecutive date.  If not it's a singel day window.

Now what if you have 3 consecutive  days each with 3 transactions?   output the first pair and ignore the 3rd?  If you have 4 consecutive days with  3 transactions each, presumably you output two non-overlapping windows each with 6 transactions. Is that correct?

``````data want (drop=_: nxt_:);

do _ntrans=1 by 1 until(last.transactiondate);
set have;
by id transactiondate;
if eoh=0 then set have
(firstobs=2 keep=id transactiondate rename=(id=nxt_id  transactiondate=nxt_td))
end=eoh;
end;

_records_since_output+1;

/* Only add the lagged _ntrans if previous rec is one day prior */
total_count = sum(_ntrans,ifn(dif(id)=0 and dif(transactiondate)=1,lag(_ntrans),0));

nxt_day_present=(id=nxt_id and transactiondate+1=nxt_td);

if total_count>=4 and (nxt_day_present=0 or _records_since_output>=2);
_records_since_output=0;
run;
``````

Discussion stats
• 9 replies
• 471 views
• 0 likes
• 4 in conversation