Re: How to identify the event occurred in four consecutive years?

Neal3321 · Posted 11-04-2019 07:59 PM

Hello all,

I am trying to identify traders who place transactions in the same month in each of three consecutive years in one company. Once a trader meets the criteria, these three transactions and all his subsequent transactions in that same month in that company should be identified.

Assume I have a sample data below.

data have;
input ID  STOCK   trandate $12.;
datalines;
1    1   10/15/2009
1    1   01/01/2010
1    1   01/10/2011
1    1   01/15/2012
1    1   01/01/2013
1    2   01/30/2011
1    2   01/30/2012
1    2   01/30/2012
1    2   01/30/2013
1    2   01/30/2014
1    2   01/30/2015
2    1   01/20/2010
2    1   01/15/2011
2    1   01/16/2012
2    1   02/01/2013
2    2   02/01/2010
2    2   02/10/2011
2    2   02/10/2012
2    2   02/10/2013
2    2   02/10/2014
2    2   01/10/2015

;
run;

What I need:

ID Stock  trandate type
1    1   10/15/2009 0
1    1   01/01/2010 1
1    1   01/10/2011 1
1    1   01/15/2012 1
1    1   01/01/2013 1
1    2   01/30/2011 1
1    2   01/30/2012 1
1    2   01/30/2012 1
1    2   01/30/2013 1
1    2   01/30/2014 1
1    2   01/30/2015 1
2    1   01/20/2010 0
2    1   01/15/2011 0
2    1   01/16/2012 0
2    1   02/01/2013 0
2    2   02/01/2010 1
2    2   02/10/2011 1
2    2   02/10/2012 1
2    2   02/10/2013 1
2    2   02/10/2014 1
2    2   01/10/2015 0

I attach the code I use to make the classification. The code is very long and I do not think it is efficient. Could you help me out?

Astounding · Posted 11-04-2019 10:55 PM

Why aren't you trying to flag ID=2, STOCK=1 observations? It looks like those meet the criteria you set up for same month, consecutive years.

mkeintz · Posted 11-04-2019 11:42 PM

@Neal3321 wrote:

Hello all,

I am trying to identify traders who place transactions in the same month in each of three consecutive years in one company. Once a trader meets the criteria, these three transactions and all his subsequent transactions in that same month in that company should be identified.

Assume I have a sample data below.
data have;
input ID  STOCK   trandate $12.;
datalines;
1    1   10/15/2009
1    1   01/01/2010
1    1   01/10/2011
1    1   01/15/2012
1    1   01/01/2013
1    2   01/30/2011
1    2   01/30/2012
1    2   01/30/2012
1    2   01/30/2013
1    2   01/30/2014
1    2   01/30/2015
2    1   01/20/2010
2    1   01/15/2011
2    1   01/16/2012
2    1   02/01/2013
2    2   02/01/2010
2    2   02/10/2011
2    2   02/10/2012
2    2   02/10/2013
2    2   02/10/2014
2    2   01/10/2015

;
run;
What I need:
ID Stock  trandate type
1    1   10/15/2009 0
1    1   01/01/2010 1
1    1   01/10/2011 1
1    1   01/15/2012 1
1    1   01/01/2013 1
1    2   01/30/2011 1
1    2   01/30/2012 1
1    2   01/30/2012 1
1    2   01/30/2013 1
1    2   01/30/2014 1
1    2   01/30/2015 1
2    1   01/20/2010 0
2    1   01/15/2011 0
2    1   01/16/2012 0
2    1   02/01/2013 0
2    2   02/01/2010 1
2    2   02/10/2011 1
2    2   02/10/2012 1
2    2   02/10/2013 1
2    2   02/10/2014 1
2    2   01/10/2015 0
I attach the code I use to make the classification. The code is very long and I do not think it is efficient. Could you help me out?

As observed by @Astounding , you may have overlooked the succession of January trades for ID=1, stock=1.

Here is a program the meets your objectives:

data have;
input ID  STOCK   trandate :mmddyy10.;
format trandate date9.;
datalines;
1    1   10/15/2009
1    1   01/01/2010
1    1   01/10/2011
1    1   01/15/2012
1    1   01/01/2013
1    2   01/30/2011
1    2   01/30/2012
1    2   01/30/2012
1    2   01/30/2013
1    2   01/30/2014
1    2   01/30/2015
2    1   01/20/2010
2    1   01/15/2011
2    1   01/16/2012
2    1   02/01/2013
2    2   02/01/2010
2    2   02/10/2011
2    2   02/10/2012
2    2   02/10/2013
2    2   02/10/2014
2    2   01/10/2015
;

data want (drop=_:);
  array trades {2008:2015,12} _temporary_;
  array type1_start_year {1:12} _temporary_;

  call missing(of type1_start_year{*});
  call missing(of trades{*});

  /* Read a set of transactions for one stock, ...
     ... and fill in trade dummies by year and month */
  do until (last.stock);
    set have;
    by id stock;
    trades{year(trandate),month(trandate)}=1;
  end;

  /* Look at the 2-way dummy array for 3 consecutive 1's in ...
     ... any month column, and identify earliest qualifying year*/
  do _y=2008 to 2013;
    do _m=1 to 12;
      if n(trades{_y,_m},trades{_y+1,_m},trades{_y+2,_m})=3 then type1_start_year{_m}=min(_y,type1_start_year{_m});
    end;
  end;

  /* Reread the same transactions, and for qualifying months, 
     compare the transaction year to the earliest qualifying year */
  do until (last.stock);
    set have;
    by id stock;
    if .<type1_start_year{month(trandate)}<=year(trandate) then type=1;
    else type=0;
    output;
  end;
run;

Notes that it reads data in clusters of ID/STOCK, tracking year and month of trades in a 2-way array named TRADES. At the end of the cluster, it examines every column (i.e. month) of TRADES looking for a sequence of 3 ones in 3 successive rows (years). If such a sequence is found then the one-way array TYPE1_START_YEAR has a starting year set in the element corresponding to the qualifying month.

Then the data are re-read, and after examining the TRANDATE for qualifying month and year, is output.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PGStats · Posted 11-04-2019 11:43 PM

First step: make tranDate a proper SAS date. Then use function intnx to move the dates by month intervals. Here is one way to do this:

data have;
input ID STOCK trandate :mmddyy10.;
format tranDate yymmdd10.;
datalines;
1    1   10/15/2009
1    1   01/01/2010
1    1   01/10/2011
1    1   01/15/2012
1    1   01/01/2013
1    2   01/30/2011
1    2   01/30/2012
1    2   01/30/2012
1    2   01/30/2013
1    2   01/30/2014
1    2   01/30/2015
2    1   01/20/2010
2    1   01/15/2011
2    1   01/16/2012
2    1   02/01/2013
2    2   02/01/2010
2    2   02/10/2011
2    2   02/10/2012
2    2   02/10/2013
2    2   02/10/2014
2    2   01/10/2015
;

proc sql;
create table firsts as
select unique
	a.ID, a.STOCK,
	intnx("month", a.tranDate, 0) as firstMonth format=mmyy.
from have as a
where 
	exists (select * from have where ID=a.ID and STOCK=a.STOCK and 
		intnx("month", a.tranDate, 12) = intnx("month", tranDate, 0) ) and
	exists (select * from have where ID=a.ID and STOCK=a.STOCK and 
		intnx("month", a.tranDate, 24) = intnx("month", tranDate, 0) ) and
	exists (select * from have where ID=a.ID and STOCK=a.STOCK and 
		intnx("month", a.tranDate, 36) = intnx("month", tranDate, 0) );
quit;
		
data allmonths;
set firsts;
output;
do i = 1 to 3;
	firstMonth = intnx("month", firstMonth, 12);
	output;
	end;
drop i;
run;

proc sort data=allMonths nodupkey; by id stock firstMonth; run;
		
proc sql;
create table want as
select 
	a.*,
	b.id is not missing as type
from 
	have as a left join 
	allMonths as b on a.id=b.id and a.stock=b.stock and
		intnx("month", a.tranDate, 0) = b.firstMonth
order by id, stock, tranDate;
quit;

PG

Ksharp · Posted 11-05-2019 07:28 AM

data have;
input ID  STOCK   trandate : mmddyy12.;
format trandate mmddyy10.;
datalines;
1    1   10/15/2009
1    1   01/01/2010
1    1   01/10/2011
1    1   01/15/2012
1    1   01/01/2013
1    2   01/30/2011
1    2   01/30/2012
1    2   01/30/2012
1    2   01/30/2013
1    2   01/30/2014
1    2   01/30/2015
2    1   01/20/2010
2    1   01/15/2011
2    1   01/16/2012
2    1   02/01/2013
2    2   02/01/2010
2    2   02/10/2011
2    2   02/10/2012
2    2   02/10/2013
2    2   02/10/2014
2    2   01/10/2015
;
run;
proc sort data=have;
by id stock trandate;
run;
data temp;
 set have;
 by id stock;
 year=year(trandate);
 month=month(trandate);
 if first.stock or dif(year) >1 then group+1;
run;
data temp1;
 set temp;
 by group month notsorted;
 group1+first.month;
run;
proc sql;
create table want as
select id,stock,trandate,ifn(count(*)>3,1,0) as type
 from temp1
  group by group1
   order by 1,2,3;
quit;

Neal3321 · Posted 11-05-2019 10:56 AM

Hello all,

Thank you for your suggestions. I did not flag ID2, Stock1 observations because they are in different month (the first three is Jan and the last one is Feb). I attempt to flag transactions in same month for at least three consecutive years.

Thanks again.

hashman · Posted 11-05-2019 11:28 AM

@Neal3321:

It looks like a nice DoW-loop exercise. Note that I've purposely changed the year in the 3rd record for (ID,stock)=(2,1) to 2013 to reveal a gap in consecutive years between this and the prior record. Otherwise, as @Astounding has noted, you end up with type=1.

data have ;                                                                                                                             
  input id stock trandate :mmddyy10. ;                                                                                                  
  format trandate yymmd7. ;                                                                                                             
  cards ;                                                                                                                               
1  1  10/15/2009                                                                                                                        
1  1  01/01/2010                                                                                                                        
1  1  01/10/2011                                                                                                                        
1  1  01/15/2012                                                                                                                        
1  1  01/01/2013                                                                                                                        
1  2  01/30/2011                                                                                                                        
1  2  01/30/2012                                                                                                                        
1  2  01/30/2012                                                                                                                        
1  2  01/30/2013                                                                                                                        
1  2  01/30/2014                                                                                                                        
1  2  01/30/2015                                                                                                                        
2  1  01/20/2010                                                                                                                        
2  1  01/15/2011                                                                                                                        
2  1  01/16/2013                                                                                                                        
2  1  02/01/2014                                                                                                                        
2  2  02/01/2010                                                                                                                        
2  2  02/10/2011                                                                                                                        
2  2  02/10/2012                                                                                                                        
2  2  02/10/2013                                                                                                                        
2  2  02/10/2014                                                                                                                        
2  2  01/10/2015                                                                                                                        
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data v / view = v ;                                                                                                                     
  set have ;                                                                                                                            
  _yr  = year  (trandate) ;                                                                                                             
  _mon = month (trandate) ;                                                                                                             
run ;                                                                                                                                   
                                                                                                                                        
data want (drop = _:) ;                                                                                                                 
  do _m = 1 by 1 until (last._mon) ;                                                                                                    
    do until (last._yr) ;                                                                                                               
      set v ;                                                                                                                           
      by id stock _mon _yr notsorted ;                                                                                                  
      _q = sum (_q, 1) ;                                                                                                                
    end ;                                                                                                                               
    _yr_dif = dif (_yr) ;                                                                                                               
    if 1 < _m <= 3 and _yr_dif ne 1 then _yr_gap = 1 ;                                                                                  
    if _m = 3 and not _yr_gap then type = 1 ;                                                                                           
  end ;                                                                                                                                 
  type = sum (type, 0) ;                                                                                                                
  do _q = 1 to _q ;                                                                                                                     
    set v ;                                                                                                                             
    output ;                                                                                                                            
  end ;                                                                                                                                 
run ;

The purpose of the innder DoW-loop by _YR is to scramble consecutive records with the same year withing the same month into counting it as a single year-month occurrence by letting _M increment only for each iteration of the outer DoW by _MON.

Kind regards

Paul D.

How to identify the event occurred in four consecutive years?