What is the fastest way if I am only interested in computing overlapping regression estimates recursively? I have daily stock return data as follows. The following code automatically downloads and generates the MWE data.
data adjusted(drop=dummy);
input ticker $ @@;
url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||
ticker||"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
infile dummy url filevar=url firstobs=2 dsd truncover end=last;
do until(last);
input @;
if index(_infile_,"null") then input;
else do;
input date yymmdd10. +1 (4*dummy)(:$1.) adjusted dummy;
output;
end;
end;
cards;
^gspc ^irx ba dis hpq ibm ko
;
data return(drop=adjusted);
set adjusted;
if ticker="^irx" then return=lag(adjusted/100)*(date-lag(date))
/(intnx("year",lag(date),1,"same")-lag(date));
else return=ifn(ticker=lag(ticker),adjusted/lag(adjusted)-1,.);
run;
proc sql;
create table rirf as
select i.ticker,i.date,i.return-j.return as rirf,
k.return-j.return as rmrf
from return(where=(ticker not in ("^gspc","^irx"))) i
join return(where=(ticker="^irx")) j on i.date=j.date
join return(where=(ticker="^gspc")) k on i.date=k.date
order by ticker,date;
quit;
which look like
Obs ticker date rirf rmrf
1 ba 19620102 . -0.85%
2 ba 19620103 1.99% 0.23%
3 ba 19620104 -0.99% -0.70%
4 ba 19620105 -1.99% -1.39%
5 ba 19620108 0.23% -0.80%
6 ba 19620109 0.24% 0.04%
7 ba 19620110 -0.01% -0.28%
8 ba 19620111 1.00% 0.59%
9 ba 19620112 2.73% 0.34%
10 ba 19620115 -1.96% -0.22%
and for each ticker in each month, I want to regress rirf
on rmrf
using the trailing 12-month data. In December 1962, I want to compute reg
estimates (I only need the beta) for each ticker using the data from January 1, 1962 to December 31, 1962. The estimates in January 1963 will be based on the data from February 1, 1962 to January 31, 1963, and so forth. Using macro
as follows will be easy but inefficient.
%macro reg;
%let date=%sysfunc(inputn(31dec1962,date9));
%do %until(&date>%sysfunc(inputn(30sep2019,date9)));
proc reg data=rirf(where=(%sysfunc(intnx(month,&date,-11))<=date<=&date))
noprint outest=reg(keep=ticker rmrf);
by ticker;
model rirf=rmrf;
quit;
%if &date=%sysfunc(inputn(31dec1962,date9)) %then %do;
data beta;
set reg;
date=&date;
run;
%end;
%else %do;
data beta;
set beta reg(in=reg);
if reg then date=&date;
run;
%end;
%let date=%sysfunc(intnx(month,&date,1,end));
%end;
%mend;
%reg
I can instead tabulate the ingredient data first for reg
and then feed the data with by
statement. This is much faster than the first one, but the intermediate rirf2
table will require gigantic disk space.
data rirf2;
date="31dec1962"d;
do until(date>"30sep2019"d);
output;
date=intnx("month",date,1,"end");
end;
run;
proc sql undo_policy=none;
create table rirf2 as
select i.date,ticker,rirf,rmrf
from rirf2 i join rirf j on j.date<=i.date<=intnx("month",j.date,11,"end")
order by date,ticker;
quit;
proc reg noprint outest=beta2(keep=date ticker rmrf);
by date ticker;
model rirf=rmrf;
quit;
I can also think of an IML version that repeats use
and close
over again doing inv(x`*x)*x`*y
recursively. There will be tones of different ways, but what is the fastest approach unless I assume parallel computing? I think movstd
in proc expand
seems efficient, but, as far as I know, proc expand
only deals with one variable. I used the first macro approach in the past, am using the second SQL approach now, and want to switch if there is anything even faster in SAS. I wonder if there is a better version from more experienced experts.
Not the answer you want, but I don't think this will work as you expect:
data return(drop=adjusted);
set adjusted;
if ticker="^irx" then return=lag(adjusted)*(date-lag(date))
/(intnx("year",lag(date),1,"same")-lag(date));
else return=ifn(ticker=lag(ticker),adjusted/lag(adjusted)-1,.);
run;
In most cases (like this one), function LAG must be called on every observation.
Also, avoid nasty messages in your LOG and replace
url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||
ticker||"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
with
URL=cats( "https://query1.finance.yahoo.com/v7/finance/download/"
, TICKER
, "?period1="
, dhms("1jan1901"d,0,0,0)-315619200
, '&period2='
, dhms("31dec2100"d,23,59,59)-315619200 );
1. Thanks for cats
. This is better than compress
.
2. I am not using first.ticker
, so it seems the if
-lag
combination is fine. For example,
data return(drop=adjusted);
set adjusted(where=(ticker="^gspc") obs=5)
adjusted(where=(ticker="^irx") obs=5)
adjusted(where=(ticker="ba") obs=5)
adjusted(where=(ticker="dis") obs=5)
adjusted(where=(ticker="hpq") obs=5)
adjusted(where=(ticker="ibm") obs=5)
adjusted(where=(ticker="ko") obs=5);
if ticker="^irx" then return=lag(adjusted/100)*(date-lag(date))
/(intnx("year",lag(date),1,"same")-lag(date));
else return=ifn(ticker=lag(ticker),adjusted/lag(adjusted)-1,.);
run;
Obs ticker date return
1 ^gspc -11690 .
2 ^gspc -11686 0.005663
3 ^gspc -11685 -0.002252
4 ^gspc -11684 -0.009594
5 ^gspc -11683 0.006268
6 ^irx 3 .
7 ^irx 4 0.000123
8 ^irx 5 0.000124
9 ^irx 6 0.000128
10 ^irx 7 0.000127
11 ba 732 .
12 ba 733 0.020000
13 ba 734 -0.009807
14 ba 735 -0.019802
15 ba 738 0.002523
16 dis 732 .
17 dis 733 0.013430
18 dis 734 0.000000
19 dis 735 0.003306
20 dis 738 -0.003295
21 hpq 732 .
22 hpq 733 -0.010654
23 hpq 734 -0.021687
24 hpq 735 -0.022168
25 hpq 738 0.015166
26 ibm 732 .
27 ibm 733 0.008740
28 ibm 734 -0.009963
29 ibm 735 -0.019694
30 ibm 738 -0.018751
31 ko 732 .
32 ko 733 -0.022324
33 ko 734 0.007611
34 ko 735 -0.022661
35 ko 738 -0.010222
The reason IF and LAG() can be used together in your case is that all the observations where the IF statement is true are sequential.
Calling @Rick_SAS
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.