Hi there, sorry second question of the day today.
I have a large data set with numerous variables two of which are "ID" and "TransactionDate"
what I want to do is count the number of transactions in two day windows per id
Example.
ID TransactionDate
1 01/11/2016
1 01/11/2016
1 02/11/2016
1 02/11/2016
2 01/11/2016
2 07/11/2016
3 03/11/2016
3 04/11/2016
3 04/11/2016
3 04/11/2016
I only want to see ID 1 and 3 as they have 4 records in a two day window.
I've had a play around with proq freq and the dif and Lag functions but I just cant seem to work it out.
Thanks in advance.
Marc
Well, step 1 is to identify records which conform to your "2 day window" criteria. You would need to flesh this out for me a bit, is it 2 days from earliest date, 2 days from previous etc.
Wonce you have that then a simple count() group by id <var>; would work.
data have;
input id transactiondate :ddmmyy10.;
format transactiondate ddmmyy10.;
cards;
1 01/11/2016
1 01/11/2016
1 02/11/2016
1 02/11/2016
2 01/11/2016
2 07/11/2016
3 03/11/2016
3 04/11/2016
3 04/11/2016
3 04/11/2016
;
run;
/* First accumulate per day */
proc sql;
create table int as
select id, transactiondate, count(*) as count
from have
group by id,transactiondate
;
quit;
/* Now compare with previous observation */
data want;
set int;
by id;
lastdate = lag(transactiondate);
format lastdate ddmmyy10.;
lastcount = lag(count);
if lastdate = transactiondate - 1
then do;
count = count + lastcount;
output;
end;
keep id transactiondate count lastdate;
run;
@Kurt_Bremser wrote:
data have; input id transactiondate :ddmmyy10.; format transactiondate ddmmyy10.; cards; 1 01/11/2016 1 01/11/2016 1 02/11/2016 1 02/11/2016 2 01/11/2016 2 07/11/2016 3 03/11/2016 3 04/11/2016 3 04/11/2016 3 04/11/2016 ; run; /* First accumulate per day */ proc sql; create table int as select id, transactiondate, count(*) as count from have group by id,transactiondate ; quit; /* Now compare with previous observation */ data want; set int; by id; lastdate = lag(transactiondate); format lastdate ddmmyy10.; lastcount = lag(count); if lastdate = transactiondate - 1 then do; count = count + lastcount; output; end; keep id transactiondate count lastdate; run;
Hi.
Thanks for this. the second part of the code there is what i was trying to do but i was a little out.
It looks to be working in the main however I have found an example where there are 4 observations on one day that are not in the last output dataset.
Do you know why this would be?
Thanks
Stret
I see. I am specifically checking for two days with data.
data want;
set int;
by id;
lastdate = lag(transactiondate);
format lastdate ddmmyy10.;
lastcount = lag(count);
if lastdate = transactiondate - 1
then do;
count = count + lastcount;
if count >= 4 then output;
end;
else do;
if count >= 4 then output;
end;
keep id transactiondate count lastdate;
run;
You might get redundant entries if you have a sequence like 3-3-3 on three successive days.
If your data are sorted by id/transaction date, then this will be efficient and simple:
notes:
data want (drop=_:);
do _ntrans=1 by 1 until (last.transactiondate);
set have;
by id transactiondate ;
end;
twoday_count=sum(_ntrans,lag(_ntrans));
if dif(transactiondate)=1;
run;
data want2 (drop=_:);
retain _last_windowdate; format _last_windowdate yymmddn8.;
do _ntrans=1 by 1 until (last.transactiondate);
set have;
by id transactiondate ;
if first.id then _last_windowdate=.;
end;
twoday_count=sum(_ntrans,lag(_ntrans));
if dif(transactiondate)=1 and (_last_windowdate^=transactiondate-2);
_last_windowdate=transactiondate;
run;
@mkeintz wrote:
If your data are sorted by id/transaction date, then this will be efficient and simple:
notes:
- The do loop contructs a count of transactions (_NTRANS) for a single day
- Then a two-record count is calculated
- Only if the prior record was one day preceding the current record, keep it
- The first program, however, will output overlapping two-day windows (i.e. 3 consecutive dates). If you don't want that, use the second program
data want (drop=_:); do _ntrans=1 by 1 until (last.transactiondate); set have; by id transactiondate ; end; twoday_count=sum(_ntrans,lag(_ntrans)); if dif(transactiondate)=1; run;
Hi.
This again seems to be working fine other than the anomoly i posted above.
I have one particular ID that has 8 Observations.
4 on the 18/11/2016
2 on the 29/11/2016
2 on the 30/11/2016
Its output the 29th and 30th however not the 4 on the 18th.
Thanks for the help
Oops sorry. I see you mean 4 trans WITHIN two days, even if only one of those days is present:
BTW, notice the "trick" here is to update the LAG/DIF queues only at the end of each day's set of transactions. Also, even though the "lag(_ntrans)" is inside an IFN function, the lag queue is updated regardless of whether the ifn test is true. Which is what you want - the lag queue is always updated even if its results are not returned by the IFN function.
data want;
do _ntrans=1 by 1 until(last.transactiondate);
set have;
by id transactiondate;
end;
total_count = sum(_ntrans,ifn(dif(transactiondate)=1 and dif(id)=0,lag(_ntrans),0));
if total_count>=4;
run;
@mkeintz wrote:
Oops sorry. I see you mean 4 trans WITHIN two days, even if only one of those days is present:
BTW, notice the "trick" here is to update the LAG/DIF queues only at the end of each day's set of transactions. Also, even though the "lag(_ntrans)" is inside an IFN function, the lag queue is updated regardless of whether the ifn test is true. Which is what you want - the lag queue is always updated even if its results are not returned by the IFN function.
data want;
do _ntrans=1 by 1 until(last.transactiondate);
set have;
by id transactiondate;
end;
total_count = sum(_ntrans,ifn(dif(transactiondate)=1 and dif(id)=0,lag(_ntrans),0));
if total_count>=4;
run;
This is making a bit more sense now and I think we're almost there with it.
heres a sample of the output I have:-
ID | TransactionDate | total_count |
1510110787 | 18/11/2016 | 4 |
1510110787 | 30/11/2016 | 4 |
1510141720 | 26/11/2016 | 4 |
1510390852 | 21/11/2016 | 5 |
1510390852 | 22/11/2016 | 7 |
1510390852 | 28/11/2016 | 4 |
1510472816 | 20/11/2016 | 4 |
I think you touched on this earler in the thread about overlapping days.
For ID 1510390852 there are 5 observations on the 21/11/2016 and 2 on the 22/11/2016.
Instead of the output being 5 and 7 for the 21st and 22nd, as these are within a two day window would just like to see 1 observation with a value of 7.
Is this possible?
Obviously for the first Id in that table as the 18th and the 30th arent within a two day window the 2 obervations with a value of 4 are correct.
Your continued help is appreciated
Stret
You want
For any pair of consecutive days output total transactions if the total>=4. This just requires looking back one record.
For any single day not part of such a pair output total transactions if the total>=4. this will require looking ahead one record to see if the next record is a consecutive date. If not it's a singel day window.
Now what if you have 3 consecutive days each with 3 transactions? output the first pair and ignore the 3rd? If you have 4 consecutive days with 3 transactions each, presumably you output two non-overlapping windows each with 6 transactions. Is that correct?
data want (drop=_: nxt_:);
do _ntrans=1 by 1 until(last.transactiondate);
set have;
by id transactiondate;
if eoh=0 then set have
(firstobs=2 keep=id transactiondate rename=(id=nxt_id transactiondate=nxt_td))
end=eoh;
end;
_records_since_output+1;
/* Only add the lagged _ntrans if previous rec is one day prior */
total_count = sum(_ntrans,ifn(dif(id)=0 and dif(transactiondate)=1,lag(_ntrans),0));
nxt_day_present=(id=nxt_id and transactiondate+1=nxt_td);
if total_count>=4 and (nxt_day_present=0 or _records_since_output>=2);
_records_since_output=0;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.