BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Junyong
Pyrite | Level 9

I have 5000 daily time series (var0001-var5000) from 01/01/2019 to 12/31/2019 and want to compute their rolling skewness each month. This post introduces proc summary for the rolling skewness.

data have(drop=i);
format t yymmddn6.;
array var(5000) var0001-var5000;
do t="1jan2019"d to "31dec2019"d;
do i=1 to 5000;
var(i)=rannor(1);
end;
output;
end;
run;

proc summary nway;
class t;
format t yymmn6.;
var var0001-var5000;
output out=want skew=;
run;

So I can quickly compute the 5000 skewness in January, February, and so on. Now suppose I want to compute each monthly skewness using two—or N more generally—consecutive months rather than just one month—for example, the February skewness using January and February data, the March skewness using February and March data, and so on. Can proc summary do this?

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

OK, I've thought some more about it.  Here's a solution that submits an expanded data set (data set view not data set file) to proc summary.

 

Test this out by using, say 10 or so variables  (%let NV=10).  Then you can examine results more closely.

 

%let nv=5000;
%let nv=0010;
data have(drop=i);
  format t yymmddn8.;
  array var{&nv} var0001-var&nv;
  do t="1jan2019"d to "31dec2019"d;
    do i=1 to &nv;
      var(i)=rannor(1);
    end;
    output;
  end;
run;

%let win_size=5;
%let up_bound=%eval(&win_size-1);

data need (rename=(t=t_end) drop=_:)  / view=need ;
  retain window_id;
  format t_beg yymmddn8. ;
  set have;
  t_beg=lag&up_bound(t);

  array vars {&nv} var0001-var&nv ;
  array win_data{0:&up_bound,&nv} _temporary_;

  _row=mod(_n_,&win_size);
  do _v=1 to dim(vars);
    win_data{_row,_v}=vars{_v};
  end;

  if _n_>=&win_size;
  window_id+1;

  output ;   /* Output the current record */
  /* Now output the earlier window constituents */
  do _obs= _n_-1 to _n_-&up_bound by -1;
    _row=mod(_obs,&win_size);
	do _v=1 to dim(vars);
	  vars{_v}=win_data{_row,_v};
	end;
	output;
  end;
run;

proc summary data=need noprint ;
  by window_id t_beg t_end ;
  var var: ;
  output out=want  skew= / autoname;
run;

The strategy above is to keep a 2-dimensional array with 5 rows (for winsize=5) and 5000 columns (number of variables.  Each incoming observation replaces the row of the array that is 5 observations old.  Also for each incoming observation calculate t_beg=lag4(t) - i.e. the starting date of the size 5 window.

 

Then comes the output.  Once you have a full window, increment window_id by 1.  OUTPUT the record-in-hand.  Then for the prior 4 days, transcribe the corresponding row of the array to the original variables, and output.  You now have a dataset, sorted by WINDOW_ID (with associated variables T_BEG and T_END) with 5 records per id.  Note it doesn't matter if the observations within the window are in chronological order - you're calculating skewness, not some time-series statistic. 

 

Then submit to PROC SUMMARY, with a BY WINDOW_ID T_BEG T_END; statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

The only way to get proc summary to generate rolling window skewness (or rolling window anything) is to expand the original data set.

 

Let's say your rolling window sizes are 5.   The you could make a data set sorted by the new variable WINDOW_ID, in sequences of 5 records.  So for window_id=1, you would have records 1-5, for window_id=2, you would have records 2-6, etc.  So that's a dataset 5 times your original dataset.  Imagine if you want window sizes of, say, 20 to 30.

 

 

But in any case, you cuold then run the PROC SUMMARY with a BY WINDOW_ID statement.  I don't suggest this.

 

Editted note:  However, you might not use too many resources by making a data set VIEW instead of a dataset FILE.

 

 

Do you have SAS/ETS (econometrics times series)?  If so, then proc expand is what you want.

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

OK, I've thought some more about it.  Here's a solution that submits an expanded data set (data set view not data set file) to proc summary.

 

Test this out by using, say 10 or so variables  (%let NV=10).  Then you can examine results more closely.

 

%let nv=5000;
%let nv=0010;
data have(drop=i);
  format t yymmddn8.;
  array var{&nv} var0001-var&nv;
  do t="1jan2019"d to "31dec2019"d;
    do i=1 to &nv;
      var(i)=rannor(1);
    end;
    output;
  end;
run;

%let win_size=5;
%let up_bound=%eval(&win_size-1);

data need (rename=(t=t_end) drop=_:)  / view=need ;
  retain window_id;
  format t_beg yymmddn8. ;
  set have;
  t_beg=lag&up_bound(t);

  array vars {&nv} var0001-var&nv ;
  array win_data{0:&up_bound,&nv} _temporary_;

  _row=mod(_n_,&win_size);
  do _v=1 to dim(vars);
    win_data{_row,_v}=vars{_v};
  end;

  if _n_>=&win_size;
  window_id+1;

  output ;   /* Output the current record */
  /* Now output the earlier window constituents */
  do _obs= _n_-1 to _n_-&up_bound by -1;
    _row=mod(_obs,&win_size);
	do _v=1 to dim(vars);
	  vars{_v}=win_data{_row,_v};
	end;
	output;
  end;
run;

proc summary data=need noprint ;
  by window_id t_beg t_end ;
  var var: ;
  output out=want  skew= / autoname;
run;

The strategy above is to keep a 2-dimensional array with 5 rows (for winsize=5) and 5000 columns (number of variables.  Each incoming observation replaces the row of the array that is 5 observations old.  Also for each incoming observation calculate t_beg=lag4(t) - i.e. the starting date of the size 5 window.

 

Then comes the output.  Once you have a full window, increment window_id by 1.  OUTPUT the record-in-hand.  Then for the prior 4 days, transcribe the corresponding row of the array to the original variables, and output.  You now have a dataset, sorted by WINDOW_ID (with associated variables T_BEG and T_END) with 5 records per id.  Note it doesn't matter if the observations within the window are in chronological order - you're calculating skewness, not some time-series statistic. 

 

Then submit to PROC SUMMARY, with a BY WINDOW_ID T_BEG T_END; statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
RichardDeVen
Barite | Level 11

Using a data row in a rolling window computation means the data row needs to be replicated for association to each window it will be part of (i.e. partitions).

A data row will thus need to be repeated with an appropriate window identifier that iterates from window (date) to window+N (of date). A month based window id from a date (your variable t) can be computed with INTCK('month',...)

 

Example:

Using @mkeintz data generator and summary.

* generate some data;

%let nv=5000; *%let nv=0010; data have(drop=i); format t yymmddn8.; array var{&nv} var0001-var&nv; do t="1jan2019"d to "31dec2019"d; do i=1 to &nv; var(i)=rannor(1); end; output; end; format var: 8.4; run;
* compute max month based window id; proc sql noprint; select max(intck('month', '01jan1960'd, t)) into :max_seqnum from have; %let MONTH_WINDOW = 2; data stage(index=(seqdate)); length seqnum 8; set have; by t; * enforce requirement of data being sorted by t; * convert date to a month based sequence number (window id); seqnum = intck('month', '01jan1960'd, t); * create replicates of row, associating data to N different month window partitions; do seqnum = seqnum to min(seqnum + &MONTH_WINDOW - 1, &MAX_SEQNUM); seqdate = intnx('month', '01jan1960'd, seqnum); * first of month for later reporting; output; end; format seqdate monyy7.; run; proc summary data=stage; by seqdate; var var:; output out=want skew= / autoname; run;

Sample output:

_FREQ_ is number of dates contributing to MONTH_WINDOW = 2 rolling month statistic (skew).

RichardADeVenezia_0-1598613674258.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1957 views
  • 2 likes
  • 3 in conversation