BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shihabur
Obsidian | Level 7

Below is my sample data. I want to select those rows which have at least 3 consecutive transactions. For example, for id 11 we have transaction on 5th, 6th and 7th day of January. But 12,13,14 does not have 3 consecutive transactions. I want to create a code which will select only id 11 and its all 3 consecutive transactions. I think it can be done in PROC SQL using some kind of counter. But not sure how to go about it.  Any idea about it will be highly appreciated.

 

data transaction;
input @1 id 2.
	  @3 amount 3.
	  @8 date MMDDYY10. ;
format date date9.;
datalines;
11 100 01/05/2015
11 150 01/06/2015
11 200 01/07/2015
12 150 01/25/2015
12 300 01/15/2015
13 100 01/08/2015
14 400 01/02/2015 
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Two methods:

 


proc sql;
create table consec as
select unique
    a.*
from
    transaction as a inner join
    (select b.id, b.date, d.date 
     from 
        transaction as b inner join
        transaction as c on b.id=c.id and intnx('day',b.date,1)=c.date inner join
        transaction as d on c.id=d.id and intnx('day',c.date,1)=d.date) 
     on a.id=b.id and a.date between b.date and d.date
order by id, date;
select * from consec;
quit;

data sequences;
retain series 0;
set transaction; by id;
prevDate = lag(date);
if first.id then series = series + 1;
else if date ne intnx('day', prevDate, 1) then series = series + 1;
drop prevDate;
run;

data consecutive;
do c = 1 by 1 until(last.series);
    set sequences; by id series;
    end;
do until(last.series);
    set sequences; by id series;
    if c >= 3 then output;
    end;
drop series c;
run;

proc print data=consecutive noobs; run;
PG

View solution in original post

10 REPLIES 10
DaveBirch
Obsidian | Level 7

A couple of questions: 1) do you need to handle instances of multiple transactions on the same day? and 2) how do you want to handle instances where transactions occur on 4 or more consecutive days.

 

Intuitively, a self-join would be better than any form of counter, or alternatively, multi-pass data step solutions. 

shihabur
Obsidian | Level 7

1. Yes. I also need to find situations where there are multiple transactions in one day. But I can find it easily by using a group by and count function within proc sql. But not within same query.

 

2. So, the aim is to get a general process in the end, like 3/4/5 or any number of consecutive day transactions. For time being, I am just starting with 3.

PGStats
Opal | Level 21

Two methods:

 


proc sql;
create table consec as
select unique
    a.*
from
    transaction as a inner join
    (select b.id, b.date, d.date 
     from 
        transaction as b inner join
        transaction as c on b.id=c.id and intnx('day',b.date,1)=c.date inner join
        transaction as d on c.id=d.id and intnx('day',c.date,1)=d.date) 
     on a.id=b.id and a.date between b.date and d.date
order by id, date;
select * from consec;
quit;

data sequences;
retain series 0;
set transaction; by id;
prevDate = lag(date);
if first.id then series = series + 1;
else if date ne intnx('day', prevDate, 1) then series = series + 1;
drop prevDate;
run;

data consecutive;
do c = 1 by 1 until(last.series);
    set sequences; by id series;
    end;
do until(last.series);
    set sequences; by id series;
    if c >= 3 then output;
    end;
drop series c;
run;

proc print data=consecutive noobs; run;
PG
shihabur
Obsidian | Level 7

Thanks a lot. This solves the problem perfectly. 

shihabur
Obsidian | Level 7

I understood the proc sql method. While data step method works fine too, I am still not sure how it is working. I get the first part of the data step method where this is mentioned.

if date ne intnx('day', prevDate, 1) then series = series + 1;

  

but the next part after that where you are using do until, I am not being able to understand how its working, specially why do until is being used twice. Can you give me some hints kindly ?

PGStats
Opal | Level 21

In the first loop, counter c counts the number of rows with the same id and series number, i.e. the length of the sequence. In the second loop, going over exactly the same sequence, the rows are output if the sequence length is >= 3.

PG
RM6
Obsidian | Level 7 RM6
Obsidian | Level 7

hi,

 

I know this is not your requirement, but I have added some more lines of data , hope it helps

data transaction;

input @1 id 2.

@3 amount 3.

@8 date MMDDYY10. ;

format date date9.;

datalines;

 

11 100 01/05/2015

11 150 01/06/2015

11 200 01/07/2015

11 200 01/10/2015

11 100 01/15/2015

11 150 01/16/2015

11 200 01/17/2015

12 150 01/25/2015

12 300 01/15/2015

13 100 01/08/2015

14 400 01/02/2015

;

run;

proc sort data=transaction; by id date; run;

data transaction_1 ;

set transaction;

by id date;

 

dt= lag(date);

if first.id then dt=.;

 

diff = date-dt;

format dt date9.;

run;

data transaction_2;

set transaction_1(where=(diff=1));

if not missing(date) then do;

 

dummy = date;

output;

end;

if not missing(dt) then do;

 

dummy = dt;

output;

end;

format dummy date9.;

run;

proc sql;

create table fin as

select a.* from transaction a

inner join (select distinct id,dummy from transaction_2) b

on a.id= b.id

and a.date=b.dummy

 

;

quit;

 

s_lassen
Meteorite | Level 14

Here is a solution using SET with POINT=:

proc sort data=transaction;
  by id date;
run;

data want;
  set transaction(keep=id date);
  by id date;
  if first.id then
    call missing(date2,date3);
  if last.date and not missing(date3) then do;
    if date-date3=2 then do p2=p3 to _N_;
      set transaction point=p2;
      output;
      end;
    end;
  if first.date then do;
    date3=date2;
    p3=p2;
    date2=date;
    p2=_N_;
    end;
  retain date2 date3 p3 p2;
  drop date2 date3 p3;
run;

 

shihabur
Obsidian | Level 7

Thanks for providing the solution. I am going through the code and can not understand p2/p3 part 

 

if date-date3=2 then do p2=p3 to _N_;
      set transaction point=p2;

 

Have not encountered this before p2 = p3 or p3 =p2. Can you please give some clues about it ?

Ksharp
Super User
data transaction;
input @1 id 2.
	  @3 amount 3.
	  @8 date MMDDYY10. ;
format date date9.;
datalines;
11 100 01/05/2015
11 150 01/06/2015
11 200 01/07/2015
12 150 01/25/2015
12 300 01/15/2015
13 100 01/08/2015
14 400 01/02/2015 
;
run;
data temp;
 set transaction;
 by id;
 dif=dif(date);
 if first.id then dif=1;
run;
data temp1;
 set temp;
 by id dif notsorted;
 group+first.dif;
run;
proc sql;
create table want as
 select *
  from temp1
   group by group
    having sum(dif=1) ge 3;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3746 views
  • 0 likes
  • 6 in conversation