Solved: Re: How to select rows having transactions on three consecutive days

shihabur · Posted 12-11-2017 10:32 PM

Below is my sample data. I want to select those rows which have at least 3 consecutive transactions. For example, for id 11 we have transaction on 5th, 6th and 7th day of January. But 12,13,14 does not have 3 consecutive transactions. I want to create a code which will select only id 11 and its all 3 consecutive transactions. I think it can be done in PROC SQL using some kind of counter. But not sure how to go about it. Any idea about it will be highly appreciated.

data transaction;
input @1 id 2.
	  @3 amount 3.
	  @8 date MMDDYY10. ;
format date date9.;
datalines;
11 100 01/05/2015
11 150 01/06/2015
11 200 01/07/2015
12 150 01/25/2015
12 300 01/15/2015
13 100 01/08/2015
14 400 01/02/2015 
;
run;

PGStats · Posted 12-11-2017 11:25 PM

Two methods:


proc sql;
create table consec as
select unique
    a.*
from
    transaction as a inner join
    (select b.id, b.date, d.date 
     from 
        transaction as b inner join
        transaction as c on b.id=c.id and intnx('day',b.date,1)=c.date inner join
        transaction as d on c.id=d.id and intnx('day',c.date,1)=d.date) 
     on a.id=b.id and a.date between b.date and d.date
order by id, date;
select * from consec;
quit;

data sequences;
retain series 0;
set transaction; by id;
prevDate = lag(date);
if first.id then series = series + 1;
else if date ne intnx('day', prevDate, 1) then series = series + 1;
drop prevDate;
run;

data consecutive;
do c = 1 by 1 until(last.series);
    set sequences; by id series;
    end;
do until(last.series);
    set sequences; by id series;
    if c >= 3 then output;
    end;
drop series c;
run;

proc print data=consecutive noobs; run;

PG

View solution in original post

DaveBirch · Posted 12-11-2017 10:53 PM

A couple of questions: 1) do you need to handle instances of multiple transactions on the same day? and 2) how do you want to handle instances where transactions occur on 4 or more consecutive days.

Intuitively, a self-join would be better than any form of counter, or alternatively, multi-pass data step solutions.

shihabur · Posted 12-12-2017 07:41 AM

1. Yes. I also need to find situations where there are multiple transactions in one day. But I can find it easily by using a group by and count function within proc sql. But not within same query.

2. So, the aim is to get a general process in the end, like 3/4/5 or any number of consecutive day transactions. For time being, I am just starting with 3.

PGStats · Posted 12-11-2017 11:25 PM

Two methods:


proc sql;
create table consec as
select unique
    a.*
from
    transaction as a inner join
    (select b.id, b.date, d.date 
     from 
        transaction as b inner join
        transaction as c on b.id=c.id and intnx('day',b.date,1)=c.date inner join
        transaction as d on c.id=d.id and intnx('day',c.date,1)=d.date) 
     on a.id=b.id and a.date between b.date and d.date
order by id, date;
select * from consec;
quit;

data sequences;
retain series 0;
set transaction; by id;
prevDate = lag(date);
if first.id then series = series + 1;
else if date ne intnx('day', prevDate, 1) then series = series + 1;
drop prevDate;
run;

data consecutive;
do c = 1 by 1 until(last.series);
    set sequences; by id series;
    end;
do until(last.series);
    set sequences; by id series;
    if c >= 3 then output;
    end;
drop series c;
run;

proc print data=consecutive noobs; run;

PG

shihabur · Posted 12-12-2017 01:26 PM

Thanks a lot. This solves the problem perfectly.

shihabur · Posted 12-13-2017 11:56 AM

I understood the proc sql method. While data step method works fine too, I am still not sure how it is working. I get the first part of the data step method where this is mentioned.

if date ne intnx('day', prevDate, 1) then series = series + 1;

but the next part after that where you are using do until, I am not being able to understand how its working, specially why do until is being used twice. Can you give me some hints kindly ?

PGStats · Posted 12-13-2017 11:04 PM

In the first loop, counter c counts the number of rows with the same id and series number, i.e. the length of the sequence. In the second loop, going over exactly the same sequence, the rows are output if the sequence length is >= 3.

PG

RM6 · Posted 12-12-2017 12:22 AM

hi,

I know this is not your requirement, but I have added some more lines of data , hope it helps

data transaction;

input @1 id 2.

@3 amount 3.

@8 date MMDDYY10. ;

format date date9.;

datalines;

11 100 01/05/2015

11 150 01/06/2015

11 200 01/07/2015

11 200 01/10/2015

11 100 01/15/2015

11 150 01/16/2015

11 200 01/17/2015

12 150 01/25/2015

12 300 01/15/2015

13 100 01/08/2015

14 400 01/02/2015

;

run;

proc sort data=transaction; by id date; run;

data transaction_1 ;

set transaction;

by id date;

dt= lag(date);

if first.id then dt=.;

diff = date-dt;

format dt date9.;

run;

data transaction_2;

set transaction_1(where=(diff=1));

if not missing(date) then do;

dummy = date;

output;

end;

if not missing(dt) then do;

dummy = dt;

output;

end;

format dummy date9.;

run;

proc sql;

create table fin as

select a.* from transaction a

inner join (select distinct id,dummy from transaction_2) b

on a.id= b.id

and a.date=b.dummy

;

quit;

s_lassen · Posted 12-12-2017 06:27 AM

Here is a solution using SET with POINT=:

proc sort data=transaction;
  by id date;
run;

data want;
  set transaction(keep=id date);
  by id date;
  if first.id then
    call missing(date2,date3);
  if last.date and not missing(date3) then do;
    if date-date3=2 then do p2=p3 to _N_;
      set transaction point=p2;
      output;
      end;
    end;
  if first.date then do;
    date3=date2;
    p3=p2;
    date2=date;
    p2=_N_;
    end;
  retain date2 date3 p3 p2;
  drop date2 date3 p3;
run;

shihabur · Posted 12-13-2017 04:43 PM

Thanks for providing the solution. I am going through the code and can not understand p2/p3 part

if date-date3=2 then do p2=p3 to _N_;
      set transaction point=p2;

Have not encountered this before p2 = p3 or p3 =p2. Can you please give some clues about it ?

Ksharp · Posted 12-12-2017 07:47 AM

data transaction;
input @1 id 2.
	  @3 amount 3.
	  @8 date MMDDYY10. ;
format date date9.;
datalines;
11 100 01/05/2015
11 150 01/06/2015
11 200 01/07/2015
12 150 01/25/2015
12 300 01/15/2015
13 100 01/08/2015
14 400 01/02/2015 
;
run;
data temp;
 set transaction;
 by id;
 dif=dif(date);
 if first.id then dif=1;
run;
data temp1;
 set temp;
 by id dif notsorted;
 group+first.dif;
run;
proc sql;
create table want as
 select *
  from temp1
   group by group
    having sum(dif=1) ge 3;
quit;

Registration is open

SAS Training: Just a Click Away