Hi there, sorry second question of the day today.
I have a large data set with numerous variables two of which are "ID" and "TransactionDate"
what I want to do is count the number of transactions in two day windows per id
Example.
ID TransactionDate
1 01/11/2016
1 01/11/2016
1 02/11/2016
1 02/11/2016
2 01/11/2016
2 07/11/2016
3 03/11/2016
3 04/11/2016
3 04/11/2016
3 04/11/2016
I only want to see ID 1 and 3 as they have 4 records in a two day window.
I've had a play around with proq freq and the dif and Lag functions but I just cant seem to work it out.
Thanks in advance.
Marc
Well, step 1 is to identify records which conform to your "2 day window" criteria. You would need to flesh this out for me a bit, is it 2 days from earliest date, 2 days from previous etc.
Wonce you have that then a simple count() group by id <var>; would work.
data have;
input id transactiondate :ddmmyy10.;
format transactiondate ddmmyy10.;
cards;
1 01/11/2016
1 01/11/2016
1 02/11/2016
1 02/11/2016
2 01/11/2016
2 07/11/2016
3 03/11/2016
3 04/11/2016
3 04/11/2016
3 04/11/2016
;
run;
/* First accumulate per day */
proc sql;
create table int as
select id, transactiondate, count(*) as count
from have
group by id,transactiondate
;
quit;
/* Now compare with previous observation */
data want;
set int;
by id;
lastdate = lag(transactiondate);
format lastdate ddmmyy10.;
lastcount = lag(count);
if lastdate = transactiondate - 1
then do;
count = count + lastcount;
output;
end;
keep id transactiondate count lastdate;
run;
@Kurt_Bremser wrote:
data have; input id transactiondate :ddmmyy10.; format transactiondate ddmmyy10.; cards; 1 01/11/2016 1 01/11/2016 1 02/11/2016 1 02/11/2016 2 01/11/2016 2 07/11/2016 3 03/11/2016 3 04/11/2016 3 04/11/2016 3 04/11/2016 ; run; /* First accumulate per day */ proc sql; create table int as select id, transactiondate, count(*) as count from have group by id,transactiondate ; quit; /* Now compare with previous observation */ data want; set int; by id; lastdate = lag(transactiondate); format lastdate ddmmyy10.; lastcount = lag(count); if lastdate = transactiondate - 1 then do; count = count + lastcount; output; end; keep id transactiondate count lastdate; run;
Hi.
Thanks for this. the second part of the code there is what i was trying to do but i was a little out.
It looks to be working in the main however I have found an example where there are 4 observations on one day that are not in the last output dataset.
Do you know why this would be?
Thanks
Stret
I see. I am specifically checking for two days with data.
data want;
set int;
by id;
lastdate = lag(transactiondate);
format lastdate ddmmyy10.;
lastcount = lag(count);
if lastdate = transactiondate - 1
then do;
count = count + lastcount;
if count >= 4 then output;
end;
else do;
if count >= 4 then output;
end;
keep id transactiondate count lastdate;
run;
You might get redundant entries if you have a sequence like 3-3-3 on three successive days.
If your data are sorted by id/transaction date, then this will be efficient and simple:
notes:
data want (drop=_:);
do _ntrans=1 by 1 until (last.transactiondate);
set have;
by id transactiondate ;
end;
twoday_count=sum(_ntrans,lag(_ntrans));
if dif(transactiondate)=1;
run;
data want2 (drop=_:);
retain _last_windowdate; format _last_windowdate yymmddn8.;
do _ntrans=1 by 1 until (last.transactiondate);
set have;
by id transactiondate ;
if first.id then _last_windowdate=.;
end;
twoday_count=sum(_ntrans,lag(_ntrans));
if dif(transactiondate)=1 and (_last_windowdate^=transactiondate-2);
_last_windowdate=transactiondate;
run;
@mkeintz wrote:
If your data are sorted by id/transaction date, then this will be efficient and simple:
notes:
- The do loop contructs a count of transactions (_NTRANS) for a single day
- Then a two-record count is calculated
- Only if the prior record was one day preceding the current record, keep it
- The first program, however, will output overlapping two-day windows (i.e. 3 consecutive dates). If you don't want that, use the second program
data want (drop=_:); do _ntrans=1 by 1 until (last.transactiondate); set have; by id transactiondate ; end; twoday_count=sum(_ntrans,lag(_ntrans)); if dif(transactiondate)=1; run;
Hi.
This again seems to be working fine other than the anomoly i posted above.
I have one particular ID that has 8 Observations.
4 on the 18/11/2016
2 on the 29/11/2016
2 on the 30/11/2016
Its output the 29th and 30th however not the 4 on the 18th.
Thanks for the help
Oops sorry. I see you mean 4 trans WITHIN two days, even if only one of those days is present:
BTW, notice the "trick" here is to update the LAG/DIF queues only at the end of each day's set of transactions. Also, even though the "lag(_ntrans)" is inside an IFN function, the lag queue is updated regardless of whether the ifn test is true. Which is what you want - the lag queue is always updated even if its results are not returned by the IFN function.
data want;
do _ntrans=1 by 1 until(last.transactiondate);
set have;
by id transactiondate;
end;
total_count = sum(_ntrans,ifn(dif(transactiondate)=1 and dif(id)=0,lag(_ntrans),0));
if total_count>=4;
run;
@mkeintz wrote:
Oops sorry. I see you mean 4 trans WITHIN two days, even if only one of those days is present:
BTW, notice the "trick" here is to update the LAG/DIF queues only at the end of each day's set of transactions. Also, even though the "lag(_ntrans)" is inside an IFN function, the lag queue is updated regardless of whether the ifn test is true. Which is what you want - the lag queue is always updated even if its results are not returned by the IFN function.
data want;
do _ntrans=1 by 1 until(last.transactiondate);
set have;
by id transactiondate;
end;
total_count = sum(_ntrans,ifn(dif(transactiondate)=1 and dif(id)=0,lag(_ntrans),0));
if total_count>=4;
run;
This is making a bit more sense now and I think we're almost there with it.
heres a sample of the output I have:-
ID | TransactionDate | total_count |
1510110787 | 18/11/2016 | 4 |
1510110787 | 30/11/2016 | 4 |
1510141720 | 26/11/2016 | 4 |
1510390852 | 21/11/2016 | 5 |
1510390852 | 22/11/2016 | 7 |
1510390852 | 28/11/2016 | 4 |
1510472816 | 20/11/2016 | 4 |
I think you touched on this earler in the thread about overlapping days.
For ID 1510390852 there are 5 observations on the 21/11/2016 and 2 on the 22/11/2016.
Instead of the output being 5 and 7 for the 21st and 22nd, as these are within a two day window would just like to see 1 observation with a value of 7.
Is this possible?
Obviously for the first Id in that table as the 18th and the 30th arent within a two day window the 2 obervations with a value of 4 are correct.
Your continued help is appreciated
Stret
You want
For any pair of consecutive days output total transactions if the total>=4. This just requires looking back one record.
For any single day not part of such a pair output total transactions if the total>=4. this will require looking ahead one record to see if the next record is a consecutive date. If not it's a singel day window.
Now what if you have 3 consecutive days each with 3 transactions? output the first pair and ignore the 3rd? If you have 4 consecutive days with 3 transactions each, presumably you output two non-overlapping windows each with 6 transactions. Is that correct?
data want (drop=_: nxt_:);
do _ntrans=1 by 1 until(last.transactiondate);
set have;
by id transactiondate;
if eoh=0 then set have
(firstobs=2 keep=id transactiondate rename=(id=nxt_id transactiondate=nxt_td))
end=eoh;
end;
_records_since_output+1;
/* Only add the lagged _ntrans if previous rec is one day prior */
total_count = sum(_ntrans,ifn(dif(id)=0 and dif(transactiondate)=1,lag(_ntrans),0));
nxt_day_present=(id=nxt_id and transactiondate+1=nxt_td);
if total_count>=4 and (nxt_day_present=0 or _records_since_output>=2);
_records_since_output=0;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.