Hello all,
I am trying to identify traders who place transactions in the same month in each of three consecutive years in one company. Once a trader meets the criteria, these three transactions and all his subsequent transactions in that same month in that company should be identified.
Assume I have a sample data below.
data have; input ID STOCK trandate $12.; datalines; 1 1 10/15/2009 1 1 01/01/2010 1 1 01/10/2011 1 1 01/15/2012 1 1 01/01/2013 1 2 01/30/2011 1 2 01/30/2012 1 2 01/30/2012 1 2 01/30/2013 1 2 01/30/2014 1 2 01/30/2015 2 1 01/20/2010 2 1 01/15/2011 2 1 01/16/2012 2 1 02/01/2013 2 2 02/01/2010 2 2 02/10/2011 2 2 02/10/2012 2 2 02/10/2013 2 2 02/10/2014 2 2 01/10/2015
; run;
What I need:
ID Stock trandate type 1 1 10/15/2009 0 1 1 01/01/2010 1 1 1 01/10/2011 1 1 1 01/15/2012 1 1 1 01/01/2013 1 1 2 01/30/2011 1 1 2 01/30/2012 1 1 2 01/30/2012 1 1 2 01/30/2013 1 1 2 01/30/2014 1 1 2 01/30/2015 1 2 1 01/20/2010 0 2 1 01/15/2011 0 2 1 01/16/2012 0 2 1 02/01/2013 0 2 2 02/01/2010 1 2 2 02/10/2011 1 2 2 02/10/2012 1 2 2 02/10/2013 1 2 2 02/10/2014 1 2 2 01/10/2015 0
I attach the code I use to make the classification. The code is very long and I do not think it is efficient. Could you help me out?
Why aren't you trying to flag ID=2, STOCK=1 observations? It looks like those meet the criteria you set up for same month, consecutive years.
@Neal3321 wrote:
Hello all,
I am trying to identify traders who place transactions in the same month in each of three consecutive years in one company. Once a trader meets the criteria, these three transactions and all his subsequent transactions in that same month in that company should be identified.
Assume I have a sample data below.
data have; input ID STOCK trandate $12.; datalines; 1 1 10/15/2009 1 1 01/01/2010 1 1 01/10/2011 1 1 01/15/2012 1 1 01/01/2013 1 2 01/30/2011 1 2 01/30/2012 1 2 01/30/2012 1 2 01/30/2013 1 2 01/30/2014 1 2 01/30/2015 2 1 01/20/2010 2 1 01/15/2011 2 1 01/16/2012 2 1 02/01/2013 2 2 02/01/2010 2 2 02/10/2011 2 2 02/10/2012 2 2 02/10/2013 2 2 02/10/2014 2 2 01/10/2015
; run;What I need:
ID Stock trandate type 1 1 10/15/2009 0 1 1 01/01/2010 1 1 1 01/10/2011 1 1 1 01/15/2012 1 1 1 01/01/2013 1 1 2 01/30/2011 1 1 2 01/30/2012 1 1 2 01/30/2012 1 1 2 01/30/2013 1 1 2 01/30/2014 1 1 2 01/30/2015 1 2 1 01/20/2010 0 2 1 01/15/2011 0 2 1 01/16/2012 0 2 1 02/01/2013 0 2 2 02/01/2010 1 2 2 02/10/2011 1 2 2 02/10/2012 1 2 2 02/10/2013 1 2 2 02/10/2014 1 2 2 01/10/2015 0I attach the code I use to make the classification. The code is very long and I do not think it is efficient. Could you help me out?
As observed by @Astounding , you may have overlooked the succession of January trades for ID=1, stock=1.
Here is a program the meets your objectives:
data have;
input ID STOCK trandate :mmddyy10.;
format trandate date9.;
datalines;
1 1 10/15/2009
1 1 01/01/2010
1 1 01/10/2011
1 1 01/15/2012
1 1 01/01/2013
1 2 01/30/2011
1 2 01/30/2012
1 2 01/30/2012
1 2 01/30/2013
1 2 01/30/2014
1 2 01/30/2015
2 1 01/20/2010
2 1 01/15/2011
2 1 01/16/2012
2 1 02/01/2013
2 2 02/01/2010
2 2 02/10/2011
2 2 02/10/2012
2 2 02/10/2013
2 2 02/10/2014
2 2 01/10/2015
;
data want (drop=_:);
array trades {2008:2015,12} _temporary_;
array type1_start_year {1:12} _temporary_;
call missing(of type1_start_year{*});
call missing(of trades{*});
/* Read a set of transactions for one stock, ...
... and fill in trade dummies by year and month */
do until (last.stock);
set have;
by id stock;
trades{year(trandate),month(trandate)}=1;
end;
/* Look at the 2-way dummy array for 3 consecutive 1's in ...
... any month column, and identify earliest qualifying year*/
do _y=2008 to 2013;
do _m=1 to 12;
if n(trades{_y,_m},trades{_y+1,_m},trades{_y+2,_m})=3 then type1_start_year{_m}=min(_y,type1_start_year{_m});
end;
end;
/* Reread the same transactions, and for qualifying months,
compare the transaction year to the earliest qualifying year */
do until (last.stock);
set have;
by id stock;
if .<type1_start_year{month(trandate)}<=year(trandate) then type=1;
else type=0;
output;
end;
run;
Notes that it reads data in clusters of ID/STOCK, tracking year and month of trades in a 2-way array named TRADES. At the end of the cluster, it examines every column (i.e. month) of TRADES looking for a sequence of 3 ones in 3 successive rows (years). If such a sequence is found then the one-way array TYPE1_START_YEAR has a starting year set in the element corresponding to the qualifying month.
Then the data are re-read, and after examining the TRANDATE for qualifying month and year, is output.
First step: make tranDate a proper SAS date. Then use function intnx to move the dates by month intervals. Here is one way to do this:
data have;
input ID STOCK trandate :mmddyy10.;
format tranDate yymmdd10.;
datalines;
1 1 10/15/2009
1 1 01/01/2010
1 1 01/10/2011
1 1 01/15/2012
1 1 01/01/2013
1 2 01/30/2011
1 2 01/30/2012
1 2 01/30/2012
1 2 01/30/2013
1 2 01/30/2014
1 2 01/30/2015
2 1 01/20/2010
2 1 01/15/2011
2 1 01/16/2012
2 1 02/01/2013
2 2 02/01/2010
2 2 02/10/2011
2 2 02/10/2012
2 2 02/10/2013
2 2 02/10/2014
2 2 01/10/2015
;
proc sql;
create table firsts as
select unique
a.ID, a.STOCK,
intnx("month", a.tranDate, 0) as firstMonth format=mmyy.
from have as a
where
exists (select * from have where ID=a.ID and STOCK=a.STOCK and
intnx("month", a.tranDate, 12) = intnx("month", tranDate, 0) ) and
exists (select * from have where ID=a.ID and STOCK=a.STOCK and
intnx("month", a.tranDate, 24) = intnx("month", tranDate, 0) ) and
exists (select * from have where ID=a.ID and STOCK=a.STOCK and
intnx("month", a.tranDate, 36) = intnx("month", tranDate, 0) );
quit;
data allmonths;
set firsts;
output;
do i = 1 to 3;
firstMonth = intnx("month", firstMonth, 12);
output;
end;
drop i;
run;
proc sort data=allMonths nodupkey; by id stock firstMonth; run;
proc sql;
create table want as
select
a.*,
b.id is not missing as type
from
have as a left join
allMonths as b on a.id=b.id and a.stock=b.stock and
intnx("month", a.tranDate, 0) = b.firstMonth
order by id, stock, tranDate;
quit;
data have;
input ID STOCK trandate : mmddyy12.;
format trandate mmddyy10.;
datalines;
1 1 10/15/2009
1 1 01/01/2010
1 1 01/10/2011
1 1 01/15/2012
1 1 01/01/2013
1 2 01/30/2011
1 2 01/30/2012
1 2 01/30/2012
1 2 01/30/2013
1 2 01/30/2014
1 2 01/30/2015
2 1 01/20/2010
2 1 01/15/2011
2 1 01/16/2012
2 1 02/01/2013
2 2 02/01/2010
2 2 02/10/2011
2 2 02/10/2012
2 2 02/10/2013
2 2 02/10/2014
2 2 01/10/2015
;
run;
proc sort data=have;
by id stock trandate;
run;
data temp;
set have;
by id stock;
year=year(trandate);
month=month(trandate);
if first.stock or dif(year) >1 then group+1;
run;
data temp1;
set temp;
by group month notsorted;
group1+first.month;
run;
proc sql;
create table want as
select id,stock,trandate,ifn(count(*)>3,1,0) as type
from temp1
group by group1
order by 1,2,3;
quit;
Hello all,
Thank you for your suggestions. I did not flag ID2, Stock1 observations because they are in different month (the first three is Jan and the last one is Feb). I attempt to flag transactions in same month for at least three consecutive years.
Thanks again.
It looks like a nice DoW-loop exercise. Note that I've purposely changed the year in the 3rd record for (ID,stock)=(2,1) to 2013 to reveal a gap in consecutive years between this and the prior record. Otherwise, as @Astounding has noted, you end up with type=1.
data have ; input id stock trandate :mmddyy10. ; format trandate yymmd7. ; cards ; 1 1 10/15/2009 1 1 01/01/2010 1 1 01/10/2011 1 1 01/15/2012 1 1 01/01/2013 1 2 01/30/2011 1 2 01/30/2012 1 2 01/30/2012 1 2 01/30/2013 1 2 01/30/2014 1 2 01/30/2015 2 1 01/20/2010 2 1 01/15/2011 2 1 01/16/2013 2 1 02/01/2014 2 2 02/01/2010 2 2 02/10/2011 2 2 02/10/2012 2 2 02/10/2013 2 2 02/10/2014 2 2 01/10/2015 ; run ; data v / view = v ; set have ; _yr = year (trandate) ; _mon = month (trandate) ; run ; data want (drop = _:) ; do _m = 1 by 1 until (last._mon) ; do until (last._yr) ; set v ; by id stock _mon _yr notsorted ; _q = sum (_q, 1) ; end ; _yr_dif = dif (_yr) ; if 1 < _m <= 3 and _yr_dif ne 1 then _yr_gap = 1 ; if _m = 3 and not _yr_gap then type = 1 ; end ; type = sum (type, 0) ; do _q = 1 to _q ; set v ; output ; end ; run ;
The purpose of the innder DoW-loop by _YR is to scramble consecutive records with the same year withing the same month into counting it as a single year-month occurrence by letting _M increment only for each iteration of the outer DoW by _MON.
Kind regards
Paul D.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.