Below is my sample data. I want to select those rows which have at least 3 consecutive transactions. For example, for id 11 we have transaction on 5th, 6th and 7th day of January. But 12,13,14 does not have 3 consecutive transactions. I want to create a code which will select only id 11 and its all 3 consecutive transactions. I think it can be done in PROC SQL using some kind of counter. But not sure how to go about it. Any idea about it will be highly appreciated.
data transaction;
input @1 id 2.
@3 amount 3.
@8 date MMDDYY10. ;
format date date9.;
datalines;
11 100 01/05/2015
11 150 01/06/2015
11 200 01/07/2015
12 150 01/25/2015
12 300 01/15/2015
13 100 01/08/2015
14 400 01/02/2015
;
run;
Two methods:
proc sql;
create table consec as
select unique
a.*
from
transaction as a inner join
(select b.id, b.date, d.date
from
transaction as b inner join
transaction as c on b.id=c.id and intnx('day',b.date,1)=c.date inner join
transaction as d on c.id=d.id and intnx('day',c.date,1)=d.date)
on a.id=b.id and a.date between b.date and d.date
order by id, date;
select * from consec;
quit;
data sequences;
retain series 0;
set transaction; by id;
prevDate = lag(date);
if first.id then series = series + 1;
else if date ne intnx('day', prevDate, 1) then series = series + 1;
drop prevDate;
run;
data consecutive;
do c = 1 by 1 until(last.series);
set sequences; by id series;
end;
do until(last.series);
set sequences; by id series;
if c >= 3 then output;
end;
drop series c;
run;
proc print data=consecutive noobs; run;
A couple of questions: 1) do you need to handle instances of multiple transactions on the same day? and 2) how do you want to handle instances where transactions occur on 4 or more consecutive days.
Intuitively, a self-join would be better than any form of counter, or alternatively, multi-pass data step solutions.
1. Yes. I also need to find situations where there are multiple transactions in one day. But I can find it easily by using a group by and count function within proc sql. But not within same query.
2. So, the aim is to get a general process in the end, like 3/4/5 or any number of consecutive day transactions. For time being, I am just starting with 3.
Two methods:
proc sql;
create table consec as
select unique
a.*
from
transaction as a inner join
(select b.id, b.date, d.date
from
transaction as b inner join
transaction as c on b.id=c.id and intnx('day',b.date,1)=c.date inner join
transaction as d on c.id=d.id and intnx('day',c.date,1)=d.date)
on a.id=b.id and a.date between b.date and d.date
order by id, date;
select * from consec;
quit;
data sequences;
retain series 0;
set transaction; by id;
prevDate = lag(date);
if first.id then series = series + 1;
else if date ne intnx('day', prevDate, 1) then series = series + 1;
drop prevDate;
run;
data consecutive;
do c = 1 by 1 until(last.series);
set sequences; by id series;
end;
do until(last.series);
set sequences; by id series;
if c >= 3 then output;
end;
drop series c;
run;
proc print data=consecutive noobs; run;
Thanks a lot. This solves the problem perfectly.
I understood the proc sql method. While data step method works fine too, I am still not sure how it is working. I get the first part of the data step method where this is mentioned.
if date ne intnx('day', prevDate, 1) then series = series + 1;
but the next part after that where you are using do until, I am not being able to understand how its working, specially why do until is being used twice. Can you give me some hints kindly ?
In the first loop, counter c counts the number of rows with the same id and series number, i.e. the length of the sequence. In the second loop, going over exactly the same sequence, the rows are output if the sequence length is >= 3.
hi,
I know this is not your requirement, but I have added some more lines of data , hope it helps
data transaction;
input @1 id 2.
@3 amount 3.
@8 date MMDDYY10. ;
format date date9.;
datalines;
11 100 01/05/2015
11 150 01/06/2015
11 200 01/07/2015
11 200 01/10/2015
11 100 01/15/2015
11 150 01/16/2015
11 200 01/17/2015
12 150 01/25/2015
12 300 01/15/2015
13 100 01/08/2015
14 400 01/02/2015
;
run;
proc sort data=transaction; by id date; run;
data transaction_1 ;
set transaction;
by id date;
dt= lag(date);
if first.id then dt=.;
diff = date-dt;
format dt date9.;
run;
data transaction_2;
set transaction_1(where=(diff=1));
if not missing(date) then do;
dummy = date;
output;
end;
if not missing(dt) then do;
dummy = dt;
output;
end;
format dummy date9.;
run;
proc sql;
create table fin as
select a.* from transaction a
inner join (select distinct id,dummy from transaction_2) b
on a.id= b.id
and a.date=b.dummy
;
quit;
Here is a solution using SET with POINT=:
proc sort data=transaction; by id date; run; data want; set transaction(keep=id date); by id date; if first.id then call missing(date2,date3); if last.date and not missing(date3) then do; if date-date3=2 then do p2=p3 to _N_; set transaction point=p2; output; end; end; if first.date then do; date3=date2; p3=p2; date2=date; p2=_N_; end; retain date2 date3 p3 p2; drop date2 date3 p3; run;
Thanks for providing the solution. I am going through the code and can not understand p2/p3 part
if date-date3=2 then do p2=p3 to _N_; set transaction point=p2;
Have not encountered this before p2 = p3 or p3 =p2. Can you please give some clues about it ?
data transaction;
input @1 id 2.
@3 amount 3.
@8 date MMDDYY10. ;
format date date9.;
datalines;
11 100 01/05/2015
11 150 01/06/2015
11 200 01/07/2015
12 150 01/25/2015
12 300 01/15/2015
13 100 01/08/2015
14 400 01/02/2015
;
run;
data temp;
set transaction;
by id;
dif=dif(date);
if first.id then dif=1;
run;
data temp1;
set temp;
by id dif notsorted;
group+first.dif;
run;
proc sql;
create table want as
select *
from temp1
group by group
having sum(dif=1) ge 3;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.