Hi I have a stock quote data set with exchange information. I want to know how many minutes was a given exchange holding the quote for a given stock. Here's part of my raw data ( the column name is DATE Sym_root Time_M and EX)
proc sort data= rawdata;
by sym_root descending time_m;
run;
data test;
set rawdata;
by sym_root descending time_m;
inforce = abs(dif(time_m));
if first.sym_root
then inforce = max(("16:00:00.000000000"t-time_m),0);
run;
data test;
set test;
if EX = 'A' then A = inforce/60; else A =0;
if EX = 'B' then B = inforce/60; else B =0;
if EX = 'C' then C = inforce/60; else C =0;
if EX = 'J' then J = inforce/60; else J =0;
if EX = 'K' then K = inforce/60; else K =0;
if EX = 'M' then M = inforce/60; else M =0;
if EX = 'N' then N = inforce/60; else N =0;
if EX = 'P' then P = inforce/60; else P =0;
if EX = 'Q' then Q = inforce/60; else Q =0;
if EX = 'T' then T = inforce/60; else T =0;
if EX = 'V' then V = inforce/60; else V =0;
if EX = 'X' then X = inforce/60; else X =0;
if EX = 'Y' then Y = inforce/60; else Y =0;
if EX = 'Z' then Z = inforce/60; else Z =0;
run;
proc report data=test nowd out= timeshare (drop=_BREAK_);
column date sym_root A B C J K M N P Q T V X Y Z;
define date / group;
define sym_root / group;
define A/sum 'A';
define B/sum 'B';
define C/sum 'C';
define J/sum 'J';
define K/sum 'K';
define M/sum 'M';
define N/sum 'N';
define P/sum 'P';
define Q/sum 'Q';
define T/sum 'T';
define V/sum 'V';
define X/sum 'X';
define Y/sum 'Y';
define Z/sum 'Z';
run;
inforce = max(("16:00:00.000000000"t-time_m),0)
to get the last inforce value.
However, one of my code problem is sometimes more than 1 exchange will hold the quote. I tried use lag(inforce) and code ' if inforce=0 then inforce=lag_inforce'. but it doesn't work when there are more than two exchange hold quote at the same time.
I really appreciate if someone can help me solve this issue. please see the attachment of part of rawdata.
If at all possible don't sort large data sets. And while each daily quote file is a moderately size data set, typically with over 6,000 stocks each a new best-bid or best-offer every second, you probably want to look at a range of dates.
So here's how to (1) use the fact that the data is already sorted by sym_root/sym_suffix/time_m, (2) look ahead one record to see what the next time will be, thereby calculating the inforce time (3) use the data step to do the daily summary.
data summaries (keep=sym_root sym_suffix date sumtime_: ntime_:) ;
set have (keep=sym_root sym_suffix date time_m where=('09:30:00't <= time_m <= '16:00:00't));
by date sym_root sym_suffix ex notsorted;
merge have
have (firstobs=2 keep=time_m rename=(time_m=_nxt_time_m));
where '09:30:00't <= time_m <= '16:00:00't ;
retain _exch_list 'ABCJKMNPQTVXYZ';
array _sumtim {*} SUMTIME_A SUMTIME_B SUMTIME_C SUMTIME_J SUMTIME_K
SUMTIME_M SUMTIME_N SUMTIME_P SUMTIME_Q SUMTIME_T
SUMTIME_V SUMTIME_X SUMTIME_Y SUMTIME_Z ;
array _ntim {*} NTIME_A NTIME_B NTIME_C NTIME_J NTIME_K
NTIME_M NTIME_N NTIME_P NTIME_Q NTIME_T
NTIME_V NTIME_X NTIME_Y NTIME_Z ;
retain sumtime_: ntime_: ;
if first.sym_suffix then do _i=1 to dim(_sumtim);
_sumtim{_i}=0; _ntim{_i}=0;
end;
if last.sym_suffix then _nxt_time_m='16:00:00't;
_i=index(_exch_list,ex);
_sumtim{_i}+ (_nxt_time_m-time_m);
if last.ex then _ntim{_i}+1;
if last.sym_suffix then output summaries;
run;
Finally, this data step generates totals, so you don't need the proc report to re-read all the data just to summarize. But if you actually want to keep all the INFORCE time spans, you could use the DATA step to simultaneously generate the summary data set as well as a data set keeping each quote, with the additional inforce time variable. Let me know if this is wanted.
1) define " was a given exchange holding the quote for a given stock"
Since your data does not have an actual quote value "holding" seems problematic.
Something along these lines:
proc summary data=rawdata nway; class sym_root ex date; var time_m; output out=work.rawsummary (drop=_type_ _freq_) range= ; run;
will get the range of your time_m variable (latest - first) for each date, symbol an ex. The value would be seconds though the code above will inherit the time format from time_m.
If you want something where "holding" refers to the same value of a quote you will need to include that quote and add that to the proc summary code.
You N exchange gets a 0 because there is only one record in the data. So there is no actual "duration" to calculate.
Hi ballardw, thank you very much for help!
I see your code. sorry for the unclear explanation, this is what I'm looking for.... Let's say just for one stock A, people bid A with different price quote , and at 9:30:0000, someone in exchange N bid the highest quote.
Sym_root Time Exchange
A 9:30:0000 N
and this best bid quote keep in 3 minute until other exchange Q provide a higher bid quote. and then five minute later, another exchange Z provide a new highest bid quote...
Sym_root Time Exchange
A 9:30:0000 N
A 9:33:0000 Q
A 9:38:0000 Z
what I want to do is I want calculate how many minutes did each exchange hold the best bid quote for a given stock. ( in my example, exchange N hold 3 min and exchange Q hold 5 min)
Because my data is huge and I have more than 3000 stocks in my database, therefore I would like arrange the column as followings. and I also want to convert time from second to minute.
How about
proc sql;
select
sym_root,
date,
ex,
range(time_m) as holdingDuration format=time20.9
from sasforum.rawdata
group by sym_root, date, ex;
quit;
Exchange that Security issued symbol Date of the root quote quote holdingDuration --------------------------------------------------- AKS 01OCT2018 B 0:00:01.005179000 AKS 01OCT2018 J 0:00:00.041457000 AKS 01OCT2018 K 0:00:01.006823000 AKS 01OCT2018 N 0:00:00.000000000 AKS 01OCT2018 P 0:00:01.006679000 AKS 01OCT2018 T 0:00:01.005284000 AKS 01OCT2018 X 0:00:00.039256000 AKS 01OCT2018 Y 0:00:00.039171000 AKS 01OCT2018 Z 0:00:01.001942000
Seems odd to me that all records about a given stock at some exchange on a given day would refer to the same quote...
Hi PG, thank you very much for help me,
Sorry for the unclear explanation, this is what I'm looking for....
Let's say just for one stock A, people bid A with different price quote , and at 9:30:0000, someone in exchange N bid the highest quote.
Sym_root Time Exchange
A 9:30:0000 N
and this best bid quote keep in 3 minute until other exchange Q provide a higher bid quote. and then five minute later, another exchange Z provide a new highest bid quote...
Sym_root Time Exchange
A 9:30:0000 N
A 9:33:0000 Q
A 9:38:0000 Z
what I want to do is I want calculate how many minutes did each exchange hold the best bid quote for a given stock. ( in my example, exchange N hold 3 min and exchange Q hold 5 min)
Because my data is huge and I have more than 3000 stocks in my database, therefore I would like arrange the column as followings. and I also want to convert time from second to minute.
Not sure if you want max or total, and what to do when the exchange repeats, but here is a start
proc sort data=sasforum.rawdata out=rawData; by sym_root date time_m; run;
data temp;
set rawdata(keep=time_m rename=time_m=nextTime);
if _n_ > 1 then do;
set rawdata; by sym_root date;
if not last.date then holdTime = nextTime - time_m;
output;
end;
run;
proc sql;
create table wantList as
select
sym_root,
date,
ex,
max(holdTime) as longestHoldingDuration format=mmss.,
sum(holdTime) as totalHoldingDuration format=mmss.
from temp
group by sym_root, date, ex;
quit;
proc transpose data=wantList out=want(drop=_name_);
by sym_root date;
var totalHoldingDuration;
id ex;
run;
Hi PG, thanks again! I use your code in my dataset, however, I found that the 'holdTime' will be 0 if there are more than one exchange have the highest bid quote at the same time. please see the following picture where I highlight
( at time 9:30:00:004505000, exchange T,X,J have the same highest quote price, and these three exchange hold the best quote for 0.000021 second until exchange T, X,B provide the higher same bid quote price at 9:30:00.004526000 )
I see the code calculate the hold time (0.000021) for exchange J at the time that I highlighted , could you tell me how can I also let SAS calculate the hold time (0.000021 same as exchange J) for exchange T and X? Thank you very much!
If at all possible don't sort large data sets. And while each daily quote file is a moderately size data set, typically with over 6,000 stocks each a new best-bid or best-offer every second, you probably want to look at a range of dates.
So here's how to (1) use the fact that the data is already sorted by sym_root/sym_suffix/time_m, (2) look ahead one record to see what the next time will be, thereby calculating the inforce time (3) use the data step to do the daily summary.
data summaries (keep=sym_root sym_suffix date sumtime_: ntime_:) ;
set have (keep=sym_root sym_suffix date time_m where=('09:30:00't <= time_m <= '16:00:00't));
by date sym_root sym_suffix ex notsorted;
merge have
have (firstobs=2 keep=time_m rename=(time_m=_nxt_time_m));
where '09:30:00't <= time_m <= '16:00:00't ;
retain _exch_list 'ABCJKMNPQTVXYZ';
array _sumtim {*} SUMTIME_A SUMTIME_B SUMTIME_C SUMTIME_J SUMTIME_K
SUMTIME_M SUMTIME_N SUMTIME_P SUMTIME_Q SUMTIME_T
SUMTIME_V SUMTIME_X SUMTIME_Y SUMTIME_Z ;
array _ntim {*} NTIME_A NTIME_B NTIME_C NTIME_J NTIME_K
NTIME_M NTIME_N NTIME_P NTIME_Q NTIME_T
NTIME_V NTIME_X NTIME_Y NTIME_Z ;
retain sumtime_: ntime_: ;
if first.sym_suffix then do _i=1 to dim(_sumtim);
_sumtim{_i}=0; _ntim{_i}=0;
end;
if last.sym_suffix then _nxt_time_m='16:00:00't;
_i=index(_exch_list,ex);
_sumtim{_i}+ (_nxt_time_m-time_m);
if last.ex then _ntim{_i}+1;
if last.sym_suffix then output summaries;
run;
Finally, this data step generates totals, so you don't need the proc report to re-read all the data just to summarize. But if you actually want to keep all the INFORCE time spans, you could use the DATA step to simultaneously generate the summary data set as well as a data set keeping each quote, with the additional inforce time variable. Let me know if this is wanted.
How do you know that two exchanges with the same time stamps means they are both at national best bid (NBB) or best offer (NBO)? Is that how the data set was constructed? I ask because that is not how the TAQ (trade-and-quote) data is distributed by the NYSE.
Also sometimes a tie occurs, not when two exchanges simultaneously issue a new best bid or offer, but rather when one exchange follows (after a time lag) another at NBB or NBO. So the other side of my question is, are you ignoring ties when one exchange lags behind another?
Now you may have considered these issues already and perhaps your strategy is to assign the INFORCE time only to the first exchange at BB or BO, and you don't see how to identify the "first" for (apparent) simultaneity. But, even though the time stamps are identical, the actual data arrive at the consolidated re-publisher in a series (in fact they have different quote identifiers representing their order). Given that much trading is based on high-frequency programmed trading, it's quite likely that actual trading is in response to the first quote in a pair of identical timestamps. I.e. you might want to treat the situation of equal timestamps for the same NBB or NBO, as you would for unequal timestamps at the same NBB or NBO.
. And a programmed
Hi Mkeintz,
I didn't consider the quote sequence number when I program my code. With your explanation, now I have more understanding towards the TAQ data. Thank you very much for cleaning up my confusion! you are a life saver 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.