How to generate obs from different datasets based on time interval to ...

Servanna · Posted 09-14-2019 01:37 PM

Hi,

I am working with high-frequency data of S&P100 (i.e. 100 datasets with a different ID name but the same variables). Now, I stuck to calculate the 'market return'.

Here is the equation for market return:

which means the market return for the first stock equals the cumulative stock return of 2 to 100 stocks divided by sample size 99 (i.e. market return calculation for each stock will be excluded itself), and so on.

I already have the value of each stock's return for the one-minute interval in three years (all in one dataset), and I want to create and calculate the one-minute market return for each stock. For example, I want to create the new variable called 'market return' for stock 1, then the market return of stock 1 for each one-minute interval should be calculated by the rest 99 stocks' stock return based on the same interval. However, I don't know how to generate the data from multiple datasets at once.

My data looks like:

ID Date interval stock return

appl 01-01-2015 14:58:07

appl 01-01-2015 14:58:15 -0.004632981

appl 01-01-2015 14:59:33

appl 01-01-2015 14:59:42 -0.000391454

appl 11-03-2016 09:13:09

appl 11-03-2016 09:13:22 0.0013331481

appl 11-03-2016 09:14:15

appl 11-03-2016 09:14:47 -0.001321647

appl 10-09-2017 12:03:03

appl 10-09-2017 12:03:17 0.0027265501

appl 10-09-2017 12:04:21

appl 10-09-2017 12:04:53 0.000750953

This is just the data sample, the real dataset has 'Date' from 01-01-2015 to 31-12-2017, and the interval is continuously from 00:00:00 to 23:59:00 for each date. The stock return has been calculated for only the last interval, which means other observations of stock return within the same interval is like missing (' . '). The ID's name will change for different stocks. Given the market return will be calculated by stock return, so my desired result for market return is for the last interval as well.

Thanks in advance. Any help is greatly appreciated!

Astounding · Posted 09-14-2019 09:52 PM

Do you really have data for the same intervals, for each stock? Your data doesn't show more than one stock, but I would guess that defining and matching up the intervals is a major undertaking. You might need to use the final times (where return is defined) and create a dummy variable equal to the same minute, 59 seconds in order to match across stocks.

At any rate, if you can define the intervals uniformly, the rest is relatively easy.

Compute the total return for all 100 stocks for each interval.
Merge that in the larger data set by INTERVAL. That gives you the total return for all 100 stocks, plus the return for the current stock (APPL) on the same observation.
Take the return for all 100 stocks, and subtract the return for the current stock. Then divide by 99.

If it turns out you are missing some of the returns for some of the stocks, the problem becomes only slightly more complex. When computing the total returns for all 100 stocks, you actually need to count the number of stocks with a nonmissing interval,. PROC SUMMARY does all of that.

Servanna · Posted 09-16-2019 01:11 PM

Hi Astounding,

1) Thanks very much for your reply. I apologize for my unclear statement. Actually, there are some data missing within the 100 stocks, most likely for different one month. So, the date will not be continuously from 01-01-2015 to 31-12-2017, instead, some stocks will miss data for Jan 2017 or Nov 2017, etc. The interval will be continuously repeated from 00:00:00 to 23:59:00 for each date. Obviously, not every minute has a trading record. In this case, there will be an issue arise since I want to calculate the market return based on the same one-minute interval (i.e. by date interval). In this case, I have to exclude the stock which missing data for the specific month and include it back for later month calculation.

2) For example, let's say there are 5 stocks (A B C D and E). If there is no missing data, the one minute market return of A equals the accumulate same one minute return of BCDE divided by 4. However, if there is missing data on JAN 2017 for stock B, then for this month the market return for A should equal accumulate return of CDE divided by 3 (given stock B has no available data in this month).

3) The good thing is I already calculate return by using the following code ('midquote' is a variable that I use to calculate return but I did not show it in the following pics):

%macro flower(sourcelib=,from=,going=);
proc sql noprint; /*read datasets in a library*/
create table mytables as
select *
from dictionary.tables
where libname = &sourcelib
order by memname ;

select count(memname)
into:obs
from mytables;

%let obs=&obs.;

select memname
into : memname1-:memname&obs.
from mytables;
quit;

%do i=1 %to &obs.;

data
&going.&&memname&i;
set
&from.&&memname&i;
by date_g_ interval;
if last.interval then midquote = (Avg_BP+Avg_AP)/2;
run;

data
&going.&&memname&i;
set
&from.&&memname&i;
by date_g_ interval;
if last.interval then output;
run;

data
&going.&&memname&i;
set
&going.&&memname&i;
return = log(midquote/lag1(midquote));
run;

data
&going.&&memname&i;
set
&from.&&memname&i;
merge &going.&&memname&i &from.&&memname&i;
by date_g_ interval;
if last.interval then stock_return = return;
run;

%end;
%mend;

%flower(sourcelib='BEE',from=BEE.,going=LEAF.);

4) Here is the selected partial data sample for the ' last.interval ' in the attachment, data of each stock looks pretty much the same, just has different '_RIC'. As I mentioned before, I have 100 datasets for 100 stocks. The biggest issue for me is how to calculate the market return for each stock if there is the issue arise as I described in part 2).

I know I have to use %macro and %do loop to do this. But I don't know how to accumulate the value 'by date_g_ interval' from the rest different datasets when I calculate the market return for stock A at e.g. 01-01-2015 14:00:00, it means I should only accumulate the rest of stocks which have data for this specific time.

I hope I describe it better, and I sincerely appreciate your help.

How to generate obs from different datasets based on time interval to calculate return?

Re: How to generate obs from different datasets based on time interval to calculate return?

Re: How to generate obs from different datasets based on time interval to calculate return?

How to generate obs from different datasets based on time interval to calculate return?

Re: How to generate obs from different datasets based on time interval to calculate return?

Re: How to generate obs from different datasets based on time interval to calculate return?

Registration is open