BookmarkSubscribeRSS Feed
Junyong
Pyrite | Level 9

What is the fastest way if I am only interested in computing overlapping regression estimates recursively? I have daily stock return data as follows. The following code automatically downloads and generates the MWE data.

data adjusted(drop=dummy);
	input ticker $ @@;
	url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||
		ticker||"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
		'&period2='||dhms("31dec2100"d,23,59,59)-315619200);
	infile dummy url filevar=url firstobs=2 dsd truncover end=last;
	do until(last);
		input @;
		if index(_infile_,"null") then input;
		else do;
			input date yymmdd10. +1 (4*dummy)(:$1.) adjusted dummy;
			output;
		end;
	end;
cards;
^gspc ^irx ba dis hpq ibm ko
;

data return(drop=adjusted);
	set adjusted;
	if ticker="^irx" then return=lag(adjusted/100)*(date-lag(date))
		/(intnx("year",lag(date),1,"same")-lag(date));
	else return=ifn(ticker=lag(ticker),adjusted/lag(adjusted)-1,.);
run;

proc sql;
	create table rirf as
		select i.ticker,i.date,i.return-j.return as rirf,
			k.return-j.return as rmrf
		from return(where=(ticker not in ("^gspc","^irx"))) i
			join return(where=(ticker="^irx")) j on i.date=j.date
			join return(where=(ticker="^gspc")) k on i.date=k.date
		order by ticker,date;
quit;

which look like

                Obs    ticker        date        rirf        rmrf

                  1      ba      19620102       .         -0.85%
                  2      ba      19620103      1.99%       0.23%
                  3      ba      19620104     -0.99%      -0.70%
                  4      ba      19620105     -1.99%      -1.39%
                  5      ba      19620108      0.23%      -0.80%
                  6      ba      19620109      0.24%       0.04%
                  7      ba      19620110     -0.01%      -0.28%
                  8      ba      19620111      1.00%       0.59%
                  9      ba      19620112      2.73%       0.34%
                 10      ba      19620115     -1.96%      -0.22%

and for each ticker in each month, I want to regress rirf on rmrf using the trailing 12-month data. In December 1962, I want to compute reg estimates (I only need the beta) for each ticker using the data from January 1, 1962 to December 31, 1962. The estimates in January 1963 will be based on the data from February 1, 1962 to January 31, 1963, and so forth. Using macro as follows will be easy but inefficient.

%macro reg;

%let date=%sysfunc(inputn(31dec1962,date9));
%do %until(&date>%sysfunc(inputn(30sep2019,date9)));

proc reg data=rirf(where=(%sysfunc(intnx(month,&date,-11))<=date<=&date))
	noprint outest=reg(keep=ticker rmrf);
	by ticker;
	model rirf=rmrf;
quit;

%if &date=%sysfunc(inputn(31dec1962,date9)) %then %do;

data beta;
	set reg;
	date=&date;
run;

%end;

%else %do;

data beta;
	set beta reg(in=reg);
	if reg then date=&date;
run;

%end;

%let date=%sysfunc(intnx(month,&date,1,end));
%end;

%mend;

%reg

I can instead tabulate the ingredient data first for reg and then feed the data with by statement. This is much faster than the first one, but the intermediate rirf2 table will require gigantic disk space.

data rirf2;
	date="31dec1962"d;
	do until(date>"30sep2019"d);
		output;
		date=intnx("month",date,1,"end");
	end;
run;

proc sql undo_policy=none;
	create table rirf2 as
		select i.date,ticker,rirf,rmrf
		from rirf2 i join rirf j on j.date<=i.date<=intnx("month",j.date,11,"end")
		order by date,ticker;
quit;

proc reg noprint outest=beta2(keep=date ticker rmrf);
	by date ticker;
	model rirf=rmrf;
quit;

I can also think of an IML version that repeats use and close over again doing inv(x`*x)*x`*y recursively. There will be tones of different ways, but what is the fastest approach unless I assume parallel computing? I think movstd in proc expand seems efficient, but, as far as I know, proc expand only deals with one variable. I used the first macro approach in the past, am using the second SQL approach now, and want to switch if there is anything even faster in SAS. I wonder if there is a better version from more experienced experts.

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Not the answer you want, but I don't think this will work as you expect:

data return(drop=adjusted);
	set adjusted;
	if ticker="^irx" then return=lag(adjusted)*(date-lag(date))
		/(intnx("year",lag(date),1,"same")-lag(date));
	else return=ifn(ticker=lag(ticker),adjusted/lag(adjusted)-1,.);
run;

In most cases (like this one), function LAG must be called on every observation.

 

Also, avoid nasty messages in your LOG and replace 

url=compress("https://query1.finance.yahoo.com/v7/finance/download/"||
		ticker||"?period1="||dhms("1jan1901"d,0,0,0)-315619200||
		'&period2='||dhms("31dec2100"d,23,59,59)-315619200);

with

URL=cats( "https://query1.finance.yahoo.com/v7/finance/download/"
        , TICKER
        , "?period1="
        , dhms("1jan1901"d,0,0,0)-315619200
        , '&period2='
        , dhms("31dec2100"d,23,59,59)-315619200 );

 

Junyong
Pyrite | Level 9

1. Thanks for cats. This is better than compress.

2. I am not using first.ticker, so it seems the if-lag combination is fine. For example,

data return(drop=adjusted);
	set adjusted(where=(ticker="^gspc") obs=5)
		adjusted(where=(ticker="^irx") obs=5)
		adjusted(where=(ticker="ba") obs=5)
		adjusted(where=(ticker="dis") obs=5)
		adjusted(where=(ticker="hpq") obs=5)
		adjusted(where=(ticker="ibm") obs=5)
		adjusted(where=(ticker="ko") obs=5);
	if ticker="^irx" then return=lag(adjusted/100)*(date-lag(date))
		/(intnx("year",lag(date),1,"same")-lag(date));
	else return=ifn(ticker=lag(ticker),adjusted/lag(adjusted)-1,.);
run;
                     Obs    ticker      date      return

                       1    ^gspc     -11690      .
                       2    ^gspc     -11686     0.005663
                       3    ^gspc     -11685    -0.002252
                       4    ^gspc     -11684    -0.009594
                       5    ^gspc     -11683     0.006268
                       6    ^irx           3      .
                       7    ^irx           4     0.000123
                       8    ^irx           5     0.000124
                       9    ^irx           6     0.000128
                      10    ^irx           7     0.000127
                      11    ba           732      .
                      12    ba           733     0.020000
                      13    ba           734    -0.009807
                      14    ba           735    -0.019802
                      15    ba           738     0.002523
                      16    dis          732      .
                      17    dis          733     0.013430
                      18    dis          734     0.000000
                      19    dis          735     0.003306
                      20    dis          738    -0.003295
                      21    hpq          732      .
                      22    hpq          733    -0.010654
                      23    hpq          734    -0.021687
                      24    hpq          735    -0.022168
                      25    hpq          738     0.015166
                      26    ibm          732      .
                      27    ibm          733     0.008740
                      28    ibm          734    -0.009963
                      29    ibm          735    -0.019694
                      30    ibm          738    -0.018751
                      31    ko           732      .
                      32    ko           733    -0.022324
                      33    ko           734     0.007611
                      34    ko           735    -0.022661
                      35    ko           738    -0.010222
ChrisNZ
Tourmaline | Level 20

The reason IF and LAG() can be used together in your case is that all the observations where the IF statement is true are sequential.

Ksharp
Super User

Calling @Rick_SAS 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 734 views
  • 0 likes
  • 3 in conversation