BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yamani
Obsidian | Level 7
Hello All, ************************************************************************************************************************************* ******* Input data set: 'developingrank', and it is attached, which includes time series data for N countries ************************************************************************************************************************************* ****** Code: I use the following code to build a data set for rolling regression (i.e., dropping earlier observations as additional observations become available): data roll; set developingrank; i=1;do while(i<=141); output; rankdate=intnx('month', date, 0)-1; i=i+1;; end;run; data roll; set roll; where ('30Aug08'd <= rankdate); run; ************************************************************************************************************************************* ****** Output (attached): Roll The output of the above code is data set 'roll' (attached), so that - the first rankdate (Dec 1996) includes data from December 1996 to August 2008 (141 observations), - the second rankdate (Jan 1997) includes data from January 1997 to September 2008 (141 observations), - the third rankdate (Feb 1997) includes data from Feb1997 to October 2008 (141 observations) , and so on ************************************************************************************************************************************* ****** REQUIRED CODE: I want to change the above SAS code to run recursive regression (i.e. keeps the starting date (i.e., December 1996) fixed, and then adding an observation to the end of the sample with every run of the regression). ************************************************************************************************************************************* ****** REQUIRED OUTPUT DATASET: So, the desired output dataset should be as follows: - the first rankdate (Dec 1996) includes data from December 1996 to August 2008 (141 observations) , - the second rankdate (Jan 1997) includes data from December 1996 to September 2008 (142 observations), - the third rankdate (Feb 1997) includes data from December 1996 to October 2008 (143 observations) , and so on ........ Any feedback will be appreciated. Thanks in advance.
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Why do you set macrovar EARLIEST_ENDDATE to August 2017?  Seems to me that is the latest enddate, which is not a needed parameter for this task.

 

Given your earliest_begdate=31dec1996, deterimine what window size is the shortest permisssable (5 months, 15 months after 31dec1996?).  Then use the corresponding date as EARLIEST_ENDDATE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
Yamani
Obsidian | Level 7
I am attaching the output data set (Roll)
Reeza
Super User

I don't have time to work on something this complex, especially to download a file and work with it. Here's an example I did for rolling means. It should give you the idea on how to do your recursive regression. Good Luck. https://gist.github.com/statgeek/e5e43ff45a4ba1f64d0873ff3bc35974

mkeintz
PROC Star

Are you saying that you want to do a regression for every rolling window of every size from NV+1 up to 141 months, for each country (where NV is n of vars in your model)?

 

In other words, if each country has 240 months of data, and 5 variables in your regression model, you want the following collection of windows

       235 rolling windows of size 6

       234 rolling windows of size 7

       ...

       100 rolling windows of size 141

==============================

     22,780 rolling windows per country containing 1,464,720 records

 

I simply don't see the point of doing this.

 

But on the subject of efficient performance of rolling regressions, it makes much more sense to construct rolling SSCP's which can then be passed to PROC REG, with an accompanying BY COUNTRY WINDOW_ID statement (or in your case BY COUNTRY START_DATE END_DATE).  The  rolling SSCP can be efficiently constructed by single pass through the original 240 observations, adding a new observation to the SSCP and simultaneously dropping the old observation leaving the rolling window. That's relatively trivial, but quite efficient  You have the non-trivial complication of wanting a range of window sizes.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Yamani
Obsidian | Level 7

Thanks for your help. Not exactly. 

 

I have 10 countries with each country has 239 months of data,

 

I want the following collection of 98 recursive windows for EACH country, as follows:

Country 1:

Window 1 has the first 141 observations (first 141 months)

Window 2 has the first 142 observations (first 142 months)

Window 3 has the first 143 observations (first 141 months)

...

Window 98 has the first 239 observations (first 239 months: and this is the full sample period)

 

Country 2:

Window 1 has the first 141 observations (first 141 months)

Window 2 has the first 142 observations (first 142 months)

Window 3 has the first 143 observations (first 141 months)

...

Window 98 has the first 239 observations (first 239 months: and this is the full sample period)

 

an so on.....

 

==============================

ballardw
Super User

The best way to provide data examples is in the form of data step code. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Data presented that way removes ambiguities of character/numeric or date value vs character and a few other issues common with pasted text "data'.

 

Then post a second data step of the resultant data set for the example input.

mkeintz
PROC Star

Here's a sample program that does "recursive" regressions using sashelp.stocks.  Each regression is identified by STOCK, BEG_DATE, and END_DATE, where BEG_DATE in your case is a constant.  The "trick" is not to constantly re-read observations to perform recursive regressions, but rather to pass through the data once, creating recursive SSCP's, which can then be read into PROC REG, with a BY STOCK BEG_DATE END_DATE set of by groups.  Of course, you'll probably want to tell the proc reg to output datasets of estimates rather than printing to the listing file as I have below:

 

proc sort data=sashelp.stocks out=have;

by stock date;

format date date9.;

run;

%let earliest_begdate=01jan1990;

%let earliest_enddate=01jan1992;

 

 

data recursive_sscp (type=SSCP drop=_I _J _nv_plus_intercept _nobs rename=(date=end_date));

set have (keep=stock date close high low open);

by stock;

where date >= "&earliest_begdate"d;

retain beg_date ;

format beg_date date9.;

array vars{*} Intercept Open High Low Close;

array vnames{5} $32 _temporary_ ('Intercept','Open','High','Low','Close');

retain _nv_plus_intercept 5;

array sscp{5,5} _temporary_;

intercept=1;

_nobs+1;

if first.stock then do;

 

beg_date=date;

_nobs=1;

call missing(of sscp{*});

end;

do _I=1 to _nv_plus_intercept;

sscp{_I,_I}+vars{_I}**2;

if _I < _nv_plus_intercept then do _J=_I+1 to _nv_plus_intercept ;

 

sscp{_I,_J}+vars{_I}*vars{_J};

sscp{_J,_I}=sscp{_I,_J};

end;

end;

if date>="&earliest_enddate"d;

_type_='SSCP';

do _I=1 to _nv_plus_intercept;

 

_name_=vnames{_I};

do _J=1 to _nv_plus_intercept;

 

vars{_j}=sscp{_I,_J};

end;

output;

end;

_type_='N';

_name_=' ';

do _J=1 to _nv_plus_intercept;

 

vars{_J}=_nobs;

end;

output;

run;

 

 

proc reg data=recursive_sscp plots=none;

by stock beg_date end_date;

model close=open low high;

quit;

 

 

 

Note that not only is there a variable named _TYPE_ in data set recursive_sscp, but also the dataset itself has TYPE=SSCP (run a proc contents on it and you'll see).  The longer your recursive windows become, the more likely you'll see performance improvements.

 

[edited addition]

Also I have set to macrovars to tell SAS when the beginning of each window is to be, and also the range of the shortest acceptable window.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Yamani
Obsidian | Level 7

Thanks for your reply. This is helpful.

I adjusted the code you sent to my data. However, the regressions are estimated just once for each country with the same and ending date. I don't know how to pass through the data once


proc sort data=ma.developed out=have;
by country date;
format date date9.;
run;
%let earliest_begdate=31dec1996;
%let earliest_enddate=31aug2017;

data recursive_sscp (type=SSCP drop=_I _J _nv_plus_intercept _nobs rename=(date=end_date));
set have (keep=country date return MA);
by country;
where date >= "&earliest_begdate"d;
retain beg_date ;
format beg_date date9.;
array vars{*} Intercept return MA;
array vnames{3} $32 _temporary_ ('Intercept','return','MA');
retain _nv_plus_intercept 3;
array sscp{3,3} _temporary_;
intercept=1;
_nobs+1;
if first.country then do;

beg_date=date;
_nobs=1;
call missing(of sscp{*});
end;
do _I=1 to _nv_plus_intercept;
sscp{_I,_I}+vars{_I}**2;
if _I < _nv_plus_intercept then do _J=_I+1 to _nv_plus_intercept ;

sscp{_I,_J}+vars{_I}*vars{_J};
sscp{_J,_I}=sscp{_I,_J};
end;
end;
if date>="&earliest_enddate"d;
_type_='SSCP';
do _I=1 to _nv_plus_intercept;

_name_=vnames{_I};
do _J=1 to _nv_plus_intercept;

vars{_j}=sscp{_I,_J};
end;
output;
end;
_type_='N';
_name_=' ';
do _J=1 to _nv_plus_intercept;

vars{_J}=_nobs;
end;
output;
run;

proc reg data=recursive_sscp plots=none;
by country beg_date end_date;
model RETURN = MA;
quit;
mkeintz
PROC Star

Why do you set macrovar EARLIEST_ENDDATE to August 2017?  Seems to me that is the latest enddate, which is not a needed parameter for this task.

 

Given your earliest_begdate=31dec1996, deterimine what window size is the shortest permisssable (5 months, 15 months after 31dec1996?).  Then use the corresponding date as EARLIEST_ENDDATE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Yamani
Obsidian | Level 7
mkeintz,

Awesome!! Finally it works! I was working on this point for more than a week!

I really appreciate your help. Thanks a lot!

Here is the adjusted code, in case anyone else needs it:


proc sort data=ma.developed out=have;
by country date;
format date date9.;
run;
%let earliest_begdate=31dec1996;
%let earliest_enddate=31aug2008;

data recursive_sscp (type=SSCP drop=_I _J _nv_plus_intercept _nobs rename=(date=end_date));
set have (keep=country date return MA);
by country;
where date >= "&earliest_begdate"d;
retain beg_date ;
format beg_date date9.;
array vars{*} Intercept return MA;
array vnames{3} $32 _temporary_ ('Intercept','return','MA');
retain _nv_plus_intercept 3;
array sscp{3,3} _temporary_;
intercept=1;
_nobs+1;
if first.country then do;

beg_date=date;
_nobs=1;
call missing(of sscp{*});
end;
do _I=1 to _nv_plus_intercept;
sscp{_I,_I}+vars{_I}**2;
if _I < _nv_plus_intercept then do _J=_I+1 to _nv_plus_intercept ;

sscp{_I,_J}+vars{_I}*vars{_J};
sscp{_J,_I}=sscp{_I,_J};
end;
end;
if date>="&earliest_enddate"d;
_type_='SSCP';
do _I=1 to _nv_plus_intercept;

_name_=vnames{_I};
do _J=1 to _nv_plus_intercept;

vars{_j}=sscp{_I,_J};
end;
output;
end;
_type_='N';
_name_=' ';
do _J=1 to _nv_plus_intercept;

vars{_J}=_nobs;
end;
output;
run;

proc reg data=recursive_sscp plots=none;
by country beg_date end_date;
model RETURN = MA;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3814 views
  • 3 likes
  • 4 in conversation