Hello!
I'm new to SAS and I love it! Hopefully I can explain my question clearly.
I'm working with a data set which contains daily security returns (and a variable to mark the dates of the returns). I need to create a data set in the same format with the average standard deviation of the previous five days of returns, including the current day.
I found code at http://support.sas.com/kb/41/380.html on calculating a rolling standard deviation using an array. My university's statistician helped me to adapt the code:
data sub(keep = B);
set returns;
run;
%let roll_num = 5;
data sub1;
set sub;
array summed[&roll_num] _temporary_;
if E = &roll_num then E = 1;
else E + 1;
summed
/*start summing once &roll_num values have been read from the data set*/
if _N_ >= &roll_num then do;
roll_std = std(of summed
end;
format roll_std comma10.4;
id = _n_;
run;
proc print data = sub1 (obs=10);
run;
This produces the following results:
B | E | roll_std | id |
---|---|---|---|
0.0055 | 1 | . | 1 |
-0.0098 | 2 | . | 2 |
0.0057 | 3 | . | 3 |
0.0029 | 4 | . | 4 |
0.0069 | 5 | 0.0069 | 5 |
0.0252 | 1 | 0.0125 | 6 |
0.0081 | 2 | 0.0088 | 7 |
0.0081 | 3 | 0.0086 | 8 |
0.0361 | 4 | 0.0132 | 9 |
0.0050 | 5 | 0.0135 | 10 |
This does some of what I want, but it is structured to perform the calculations on one variable only, as the KEEP=option [data sub(keep = B);] drops everything but the second variable in my data set. I think the intent was to use DO loops to perform this calculation on all the variables, but I do not know how to do this or to capture the output that I want (the roll_std variable). This would be one way to complete this problem.
Alternatively, it may be easier to keep all the variables in the sub data set and expand this code to the entire data set. I tried removing the KEEP= option, but this gives me the following error after the last DATA step: "ERROR: Array subscript out of range" at line xx column y.
Thanks in advance for any help with this!
Hope this provides the result you are looking for. Modified the code at http://support.sas.com/kb/41/380.html by adding a macro to support multiple columns. Code to create the input dataset has also been included so you can see the entire process. After running this program see the log file for code written by the macro.
options mprint;
/* create monthly sales data from January 2008 to December 2010 */
data work.test;
do mo_period = '01jan2008'd to '31dec2010'd;
mo_period = intnx('month', mo_period, 0, 'END');
/* create 3 colums, one for each value to be evaluated. */
do eval_column = 'col_1', 'col_2', 'col_3';
sales = round(ranuni(1234567) * 1000, .01);
output;
end;
end;
format mo_period monyy7. sales comma10.2;
run;
/* confirm output is as expected. */
proc print data=work.test(obs=20);
run;
/* convert data to across format. */
proc transpose data=work.test out=work.test1(drop=_name_);
var sales;
by mo_period;
id eval_column;
run;
/* confirm output is as expected. */
proc print data=work.test1(obs=10);
run;
/*specify the number of periods in the rolling sum and average*/
%let roll_num = 12;
/*specify the number of columns to process */
%let columns = 3;
%macro tst;
data work.new;
set work.test1;
/*create array with specific number of elements*/
/*passing _temporary_ arrays as arguments to functions is not supported until SAS 9.2*/
/*if the array is a _TEMPORARY_ array, the elements are automatically retained*/
/* for the first observation only, create an array for each column to be evaluated.*/
if (_n_ = 1) then
do;
%do i = 1 %to &columns;
array summed_&i [&roll_num] _temporary_;
format roll_sum_&i roll_avg_&i comma10.2;
%end;
end;
/*alternatively, a non-temporary array can be used but must be retained:*/
/*array summed[&roll_num];*/
/*retain summed;*/
/*E represents the element of the array to assign a sales value*/
/*Increment it by one unless it is equal to &roll_num, at which point*/
/*start over and assign it a value of 1. This causes the oldest period to*/
/*be replaced by the newest period once &roll_num periods have been read.*/
if (E = &roll_num) then
E = 1;
else
E + 1;
%do i = 1 %to &columns;
/*assign value to proper element of the array*/
summed_&i
/*start summing once &roll_num values have been read from the data set*/
if (_N_ >= &roll_num) then
do;
roll_sum_&i = sum(of summed_&i
roll_avg_&i = mean(of summed_&i
roll_std_&i = std(of summed_&i
end;
%end;
run;
%mend tst;
%tst;
options ls=160 ps=30;
proc print;
run;
Hope this provides the result you are looking for. Modified the code at http://support.sas.com/kb/41/380.html by adding a macro to support multiple columns. Code to create the input dataset has also been included so you can see the entire process. After running this program see the log file for code written by the macro.
options mprint;
/* create monthly sales data from January 2008 to December 2010 */
data work.test;
do mo_period = '01jan2008'd to '31dec2010'd;
mo_period = intnx('month', mo_period, 0, 'END');
/* create 3 colums, one for each value to be evaluated. */
do eval_column = 'col_1', 'col_2', 'col_3';
sales = round(ranuni(1234567) * 1000, .01);
output;
end;
end;
format mo_period monyy7. sales comma10.2;
run;
/* confirm output is as expected. */
proc print data=work.test(obs=20);
run;
/* convert data to across format. */
proc transpose data=work.test out=work.test1(drop=_name_);
var sales;
by mo_period;
id eval_column;
run;
/* confirm output is as expected. */
proc print data=work.test1(obs=10);
run;
/*specify the number of periods in the rolling sum and average*/
%let roll_num = 12;
/*specify the number of columns to process */
%let columns = 3;
%macro tst;
data work.new;
set work.test1;
/*create array with specific number of elements*/
/*passing _temporary_ arrays as arguments to functions is not supported until SAS 9.2*/
/*if the array is a _TEMPORARY_ array, the elements are automatically retained*/
/* for the first observation only, create an array for each column to be evaluated.*/
if (_n_ = 1) then
do;
%do i = 1 %to &columns;
array summed_&i [&roll_num] _temporary_;
format roll_sum_&i roll_avg_&i comma10.2;
%end;
end;
/*alternatively, a non-temporary array can be used but must be retained:*/
/*array summed[&roll_num];*/
/*retain summed;*/
/*E represents the element of the array to assign a sales value*/
/*Increment it by one unless it is equal to &roll_num, at which point*/
/*start over and assign it a value of 1. This causes the oldest period to*/
/*be replaced by the newest period once &roll_num periods have been read.*/
if (E = &roll_num) then
E = 1;
else
E + 1;
%do i = 1 %to &columns;
/*assign value to proper element of the array*/
summed_&i
/*start summing once &roll_num values have been read from the data set*/
if (_N_ >= &roll_num) then
do;
roll_sum_&i = sum(of summed_&i
roll_avg_&i = mean(of summed_&i
roll_std_&i = std(of summed_&i
end;
%end;
run;
%mend tst;
%tst;
options ls=160 ps=30;
proc print;
run;
Thanks bcs,
This was what I was looking for. I had to rename my variables to match col_1, col_2, etc., but once I did that, it worked great.
Thanks!
Do you have a license for SAS/ETS? If so, look into proc expand which is much easier.
Reeza,
I do have a SAS license. I looked into proc expand and I can see what you mean. In the future, I'll have to use it instead of this method.
Thanks for the helpful suggestion.
What does your data look like , and what does your output look like ?
Hi Xia,
My data has a variable with dates, as well as several variables with returns (like column B in my original post). I've now renamed these variables "col_1, col_2, etc." so they work with the macro bcs wrote.
The output has the same data but also adds variables like roll_std_1, roll_std_2, etc. to match the input variables. This data looks like the roll_std column in my original post.
Does this answer your question?
OK. But I would like to use SQL.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.