Hello all,
I am trying to identify traders who place transactions in the same month in each of three consecutive years in one company. Once a trader meets the criteria, these three transactions and all his subsequent transactions in that same month in that company should be identified.
Assume I have a sample data below.
data have; input ID STOCK trandate $12.; datalines; 1 1 10/15/2009 1 1 01/01/2010 1 1 01/10/2011 1 1 01/15/2012 1 1 01/01/2013 1 2 01/30/2011 1 2 01/30/2012 1 2 01/30/2012 1 2 01/30/2013 1 2 01/30/2014 1 2 01/30/2015 2 1 01/20/2010 2 1 01/15/2011 2 1 01/16/2012 2 1 02/01/2013 2 2 02/01/2010 2 2 02/10/2011 2 2 02/10/2012 2 2 02/10/2013 2 2 02/10/2014 2 2 01/10/2015
; run;
What I need:
ID Stock trandate type 1 1 10/15/2009 0 1 1 01/01/2010 1 1 1 01/10/2011 1 1 1 01/15/2012 1 1 1 01/01/2013 1 1 2 01/30/2011 1 1 2 01/30/2012 1 1 2 01/30/2012 1 1 2 01/30/2013 1 1 2 01/30/2014 1 1 2 01/30/2015 1 2 1 01/20/2010 0 2 1 01/15/2011 0 2 1 01/16/2012 0 2 1 02/01/2013 0 2 2 02/01/2010 1 2 2 02/10/2011 1 2 2 02/10/2012 1 2 2 02/10/2013 1 2 2 02/10/2014 1 2 2 01/10/2015 0
I attach the code I use to make the classification. The code is very long and I do not think it is efficient. Could you help me out?
Why aren't you trying to flag ID=2, STOCK=1 observations? It looks like those meet the criteria you set up for same month, consecutive years.
@Neal3321 wrote:
Hello all,
I am trying to identify traders who place transactions in the same month in each of three consecutive years in one company. Once a trader meets the criteria, these three transactions and all his subsequent transactions in that same month in that company should be identified.
Assume I have a sample data below.
data have; input ID STOCK trandate $12.; datalines; 1 1 10/15/2009 1 1 01/01/2010 1 1 01/10/2011 1 1 01/15/2012 1 1 01/01/2013 1 2 01/30/2011 1 2 01/30/2012 1 2 01/30/2012 1 2 01/30/2013 1 2 01/30/2014 1 2 01/30/2015 2 1 01/20/2010 2 1 01/15/2011 2 1 01/16/2012 2 1 02/01/2013 2 2 02/01/2010 2 2 02/10/2011 2 2 02/10/2012 2 2 02/10/2013 2 2 02/10/2014 2 2 01/10/2015
; run;What I need:
ID Stock trandate type 1 1 10/15/2009 0 1 1 01/01/2010 1 1 1 01/10/2011 1 1 1 01/15/2012 1 1 1 01/01/2013 1 1 2 01/30/2011 1 1 2 01/30/2012 1 1 2 01/30/2012 1 1 2 01/30/2013 1 1 2 01/30/2014 1 1 2 01/30/2015 1 2 1 01/20/2010 0 2 1 01/15/2011 0 2 1 01/16/2012 0 2 1 02/01/2013 0 2 2 02/01/2010 1 2 2 02/10/2011 1 2 2 02/10/2012 1 2 2 02/10/2013 1 2 2 02/10/2014 1 2 2 01/10/2015 0I attach the code I use to make the classification. The code is very long and I do not think it is efficient. Could you help me out?
As observed by @Astounding , you may have overlooked the succession of January trades for ID=1, stock=1.
Here is a program the meets your objectives:
data have;
input ID STOCK trandate :mmddyy10.;
format trandate date9.;
datalines;
1 1 10/15/2009
1 1 01/01/2010
1 1 01/10/2011
1 1 01/15/2012
1 1 01/01/2013
1 2 01/30/2011
1 2 01/30/2012
1 2 01/30/2012
1 2 01/30/2013
1 2 01/30/2014
1 2 01/30/2015
2 1 01/20/2010
2 1 01/15/2011
2 1 01/16/2012
2 1 02/01/2013
2 2 02/01/2010
2 2 02/10/2011
2 2 02/10/2012
2 2 02/10/2013
2 2 02/10/2014
2 2 01/10/2015
;
data want (drop=_:);
array trades {2008:2015,12} _temporary_;
array type1_start_year {1:12} _temporary_;
call missing(of type1_start_year{*});
call missing(of trades{*});
/* Read a set of transactions for one stock, ...
... and fill in trade dummies by year and month */
do until (last.stock);
set have;
by id stock;
trades{year(trandate),month(trandate)}=1;
end;
/* Look at the 2-way dummy array for 3 consecutive 1's in ...
... any month column, and identify earliest qualifying year*/
do _y=2008 to 2013;
do _m=1 to 12;
if n(trades{_y,_m},trades{_y+1,_m},trades{_y+2,_m})=3 then type1_start_year{_m}=min(_y,type1_start_year{_m});
end;
end;
/* Reread the same transactions, and for qualifying months,
compare the transaction year to the earliest qualifying year */
do until (last.stock);
set have;
by id stock;
if .<type1_start_year{month(trandate)}<=year(trandate) then type=1;
else type=0;
output;
end;
run;
Notes that it reads data in clusters of ID/STOCK, tracking year and month of trades in a 2-way array named TRADES. At the end of the cluster, it examines every column (i.e. month) of TRADES looking for a sequence of 3 ones in 3 successive rows (years). If such a sequence is found then the one-way array TYPE1_START_YEAR has a starting year set in the element corresponding to the qualifying month.
Then the data are re-read, and after examining the TRANDATE for qualifying month and year, is output.
First step: make tranDate a proper SAS date. Then use function intnx to move the dates by month intervals. Here is one way to do this:
data have;
input ID STOCK trandate :mmddyy10.;
format tranDate yymmdd10.;
datalines;
1 1 10/15/2009
1 1 01/01/2010
1 1 01/10/2011
1 1 01/15/2012
1 1 01/01/2013
1 2 01/30/2011
1 2 01/30/2012
1 2 01/30/2012
1 2 01/30/2013
1 2 01/30/2014
1 2 01/30/2015
2 1 01/20/2010
2 1 01/15/2011
2 1 01/16/2012
2 1 02/01/2013
2 2 02/01/2010
2 2 02/10/2011
2 2 02/10/2012
2 2 02/10/2013
2 2 02/10/2014
2 2 01/10/2015
;
proc sql;
create table firsts as
select unique
a.ID, a.STOCK,
intnx("month", a.tranDate, 0) as firstMonth format=mmyy.
from have as a
where
exists (select * from have where ID=a.ID and STOCK=a.STOCK and
intnx("month", a.tranDate, 12) = intnx("month", tranDate, 0) ) and
exists (select * from have where ID=a.ID and STOCK=a.STOCK and
intnx("month", a.tranDate, 24) = intnx("month", tranDate, 0) ) and
exists (select * from have where ID=a.ID and STOCK=a.STOCK and
intnx("month", a.tranDate, 36) = intnx("month", tranDate, 0) );
quit;
data allmonths;
set firsts;
output;
do i = 1 to 3;
firstMonth = intnx("month", firstMonth, 12);
output;
end;
drop i;
run;
proc sort data=allMonths nodupkey; by id stock firstMonth; run;
proc sql;
create table want as
select
a.*,
b.id is not missing as type
from
have as a left join
allMonths as b on a.id=b.id and a.stock=b.stock and
intnx("month", a.tranDate, 0) = b.firstMonth
order by id, stock, tranDate;
quit;
data have;
input ID STOCK trandate : mmddyy12.;
format trandate mmddyy10.;
datalines;
1 1 10/15/2009
1 1 01/01/2010
1 1 01/10/2011
1 1 01/15/2012
1 1 01/01/2013
1 2 01/30/2011
1 2 01/30/2012
1 2 01/30/2012
1 2 01/30/2013
1 2 01/30/2014
1 2 01/30/2015
2 1 01/20/2010
2 1 01/15/2011
2 1 01/16/2012
2 1 02/01/2013
2 2 02/01/2010
2 2 02/10/2011
2 2 02/10/2012
2 2 02/10/2013
2 2 02/10/2014
2 2 01/10/2015
;
run;
proc sort data=have;
by id stock trandate;
run;
data temp;
set have;
by id stock;
year=year(trandate);
month=month(trandate);
if first.stock or dif(year) >1 then group+1;
run;
data temp1;
set temp;
by group month notsorted;
group1+first.month;
run;
proc sql;
create table want as
select id,stock,trandate,ifn(count(*)>3,1,0) as type
from temp1
group by group1
order by 1,2,3;
quit;
Hello all,
Thank you for your suggestions. I did not flag ID2, Stock1 observations because they are in different month (the first three is Jan and the last one is Feb). I attempt to flag transactions in same month for at least three consecutive years.
Thanks again.
It looks like a nice DoW-loop exercise. Note that I've purposely changed the year in the 3rd record for (ID,stock)=(2,1) to 2013 to reveal a gap in consecutive years between this and the prior record. Otherwise, as @Astounding has noted, you end up with type=1.
data have ;
input id stock trandate :mmddyy10. ;
format trandate yymmd7. ;
cards ;
1 1 10/15/2009
1 1 01/01/2010
1 1 01/10/2011
1 1 01/15/2012
1 1 01/01/2013
1 2 01/30/2011
1 2 01/30/2012
1 2 01/30/2012
1 2 01/30/2013
1 2 01/30/2014
1 2 01/30/2015
2 1 01/20/2010
2 1 01/15/2011
2 1 01/16/2013
2 1 02/01/2014
2 2 02/01/2010
2 2 02/10/2011
2 2 02/10/2012
2 2 02/10/2013
2 2 02/10/2014
2 2 01/10/2015
;
run ;
data v / view = v ;
set have ;
_yr = year (trandate) ;
_mon = month (trandate) ;
run ;
data want (drop = _:) ;
do _m = 1 by 1 until (last._mon) ;
do until (last._yr) ;
set v ;
by id stock _mon _yr notsorted ;
_q = sum (_q, 1) ;
end ;
_yr_dif = dif (_yr) ;
if 1 < _m <= 3 and _yr_dif ne 1 then _yr_gap = 1 ;
if _m = 3 and not _yr_gap then type = 1 ;
end ;
type = sum (type, 0) ;
do _q = 1 to _q ;
set v ;
output ;
end ;
run ;
The purpose of the innder DoW-loop by _YR is to scramble consecutive records with the same year withing the same month into counting it as a single year-month occurrence by letting _M increment only for each iteration of the outer DoW by _MON.
Kind regards
Paul D.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.