DATA Step, Macro, Functions and more

window operation by group

Accepted Solution Solved
Reply
Highlighted
Occasional Contributor
Posts: 14
Accepted Solution

window operation by group

Hi,

 

This is an example dataset for my issue.

proc sort data=sashelp.stocks out=stocks;
by stock date;
run;

 

I am trying to perform a moving average and other similar windowed operation by group in a data step. In the example data set I want to calculate moving average of "Open" by "Stock" and use "Date" column to create variable window size. A sample output looks like as below.

 

image.png

What will be a data step framework like if I want to do this for all the numerical columns in the initial data and also want to be able to use macros(example takes data from open column and date column to spit some output for each moving window) for specific task like moving average?

 

Any help will be appreciated.

 

Thanks,

 

 


Accepted Solutions
Solution
2 weeks ago
Super User
Super User
Posts: 6,706

Re: window operation by group

So if you like that method then here is a macro to implement it.

%macro mavg(in,out,id,order,vars,lags);
%local i j ni nj var lag ;
%let ni=%sysfunc(countw(&vars));
%let nj=%sysfunc(countw(&lags));
data &out ;
  set &in(keep=&id &order &vars);
  by &id &order ;
  n+1 ;
%do i=1 %to ∋
  %let var=%scan(&vars,&i);
  %do j=1 %to &nj;
    %let lag=%scan(&lags,&j);
  array lag&lag._&var. (0:%eval(&lag.-1)) _temporary_ ;
  %end;
%end;
  if first.%scan(&id,-1) then do;
    n=1;
    call missing(of 
%do i=1 %to ∋
  %do j=1 %to &nj;
    lag%scan(&lags,&j)_%scan(&vars,&i)(*)
  %end;
%end;
    );
  end;
%do i=1 %to ∋
  %let var=%scan(&vars,&i);
  %do j=1 %to &nj;
    %let lag=%scan(&lags,&j);
  lag&lag._&var.(mod(n,&lag))=&var ;
  if n >= &lag. then &var._avg&lag. = round(mean(of lag&lag._&var.(*)),.01);
  %end;
%end;
run;
%mend mavg ;

And here is how to call it to replicate the results from before.

%mavg
(in=stocks
,out=want
,id=stock
,order=date
,vars=open close
,lags=3 4
);

View solution in original post


All Replies
Super User
Super User
Posts: 7,565

Re: window operation by group

I don't see any need to use macros here, simple arrays will suffice.  In future post test data in the form of a datastep (using the {i} code window above post to keep formatting).  Also post example output required.  At a guess:

data want;
  set have;
  by stock;
  array nums _numeric_;
  do over nums;
    ...;
  end;
run;

The by group does the grouping, the _numeric_ selects all numeric variables into an array reference nums.  The do loop does code over that array.  Obviously without further information can't tailor it any further.

Occasional Contributor
Posts: 14

Re: window operation by group

The sample output is shown in the picture contained in original post.
Macro use was referenced to calculation of average and similar operation on the given window size.
Window is dynamic, as shown in sample output i want output column for each window size.
Super User
Super User
Posts: 7,565

Re: window operation by group

Maybe start by taking a look at the guidance on posting a question found by the post button on new posts, and review some other questions given here.  Post test data as a datastep in the body of the post, a screenshot is no good for anyone.

"Macro use was referenced to calculation of average and similar operation on the given window size." - this sentance makes no sense to me.  First write Base SAS code which works, if you then need to repeat it, then macrotise it, so far you have shown no working code at all so now is not the time to be talking about macros.  Base SAS is the language, and has functions like mean(), and proc means to get these types of outputs.

Window is dynamic, I cannot tell anything from the screenshot - I need test data before, and example output as it should be afterwards, and clarification of the logic between the two, I cannot guess what you want doing.

Occasional Contributor
Posts: 14

Re: window operation by group

Thanks for your quick response RW9 and sorry for not being clear.

 

Test data is sashelp.stocks data, so I assumed it would be accessible to a SAS user hence i used proc sort to show sample test data. I felt the first step will be to sort the data by "stock" and "date" so the code.

 

Screenshot is only for the sample expected output.

 

Let's forget about macro for now and say I want to calculate moving average for various window size (2, 3, 5, 7 etc.)

 

 

Super User
Posts: 18,588

Re: window operation by group

Look at  the TIMESERIES and EXPAND procedure, I believe there a tasks for these in EG

Occasional Contributor
Posts: 14

Re: window operation by group

Thanks for your response Reeza, but unfortunately I do not have license for either of those. Trying to do this using data step.

Super User
Posts: 10,873

Re: window operation by group

How long of a "moving average" do you want to use? 2 months, 3 months, 6 months, year something else?

And since your data is apparently in reverse chronologic order what type of period do you want?  That example data would not result from your sort code as this is the first few records when I run the sort:

 

 

Obs Stock Date Open High
1 IBM 01AUG86 $132.63 $142.00
2 IBM 02SEP86 $138.88 $145.50
3 IBM 01OCT86 $134.88 $135.50
4 IBM 03NOV86 $124.00 $127.87
5 IBM 01DEC86 $126.50 $130.13
6 IBM 02JAN87 $121.00 $130.88
7 IBM 02FEB87 $128.25 $144.25
8 IBM 02MAR87 $139.50 $156.63
9 IBM 01APR87 $149.00 $161.50
10 IBM 01MAY87 $160.00 $167.63
Occasional Contributor
Posts: 14

Re: window operation by group

Thanks for your response ballardw!

proc sort data=sashelp.stocks out=stocks;
by stock date;
run;
data stocks;
set stocks (where =(Date>'01SEP04'd));
drop high low close Volume AdjClose; 
run;

I just made a smaller data for inital start. The output sample is only a snapshot of that.

I want various length of moving average, 2 months, 3 months, 5 months, 7 months etc. (one column output for each). So 3_month_average in row with date AUG2017 should have average of (AUG2017, JUL2017, JUN2017). Similarly, 3_month_average in row with date JUL2017 should have average of (JUL2017, JUN2017, MAY2017), and so on.

 

I did a descending sort by date to show the final sample output hence in reverse order.

 

I have been doing this by doing a dynamic lag using a very complicated macro. I am looking for an efficient way so that I can scale it and apply the same to multiple attributes like "open".

 

 

 

Occasional Contributor
Posts: 14

Re: window operation by group

Reeza thanks for your links.

 

So, let's say I have a macro like this

 

%macro moving_avg(have, want, groupby_var, time_var, var_name, max_window_size, window_sizes);

an execution example of which is 

 

 

%moving_avg(have, want, stock, date, open, 7, '2,3,5,7');

"max_window_size" is used to create maximum number of lag needed to accomodate all window size.

 

"window_sizes" is all possible window size I want to do moving avg for.

"have" has columns "stock", "date" and "open"

"want" will have columns "stock", "date", "open", "avg2", "avg3", "avg5" and "avg7"

 

If I need to do this for all other columns that are numeric not just "open" how do I structure my data step?

 

 

Super User
Posts: 18,588

Re: window operation by group

It depends on the macro. WIthout seeing it can't say. It could be designed to work inside a data step or it could be self contained. 

 

 

Occasional Contributor
Posts: 14

Re: window operation by group

 

%macro moving_avg(have, want, groupby_var, time_var, var_name, max_window_size, window_sizes); 
/* Sort "have" by "groupby_var" and "time_var" */
/* Assign an array of macro with window size values (includes multiple steps) */
data want; 
set have; 
by groupby_var;
/* Create array using "max_window_size" for lag variables */
/* Create lag variables (this creates additional lag columns: lag_1-lag_max_window_size) */
/* Here I have a step to correct for the bleeding of the lag variable to the next group */
/*I use a do loop to calculate mov_avg for various window size; "&abc" in the code below is to get the window size dynamically*/ 
%do j = 1 %to &yy.; 
&var_name.avg_&&abc&j = round(mean(of lag1 - lag&&abc&j), 0.01); 
%end;
/* drop variables that are not required */ 
run;
%mend;

 

Does this help for the macro content?

Super User
Super User
Posts: 6,706

Re: window operation by group

It is good to have a vision for how you would like to be able to make the specification of your choices (basically what you would like the macro call to look like) but to begin actually writing a macro you need know what actual SAS code it needs to generate.

 

Which part do you need help with?

 

If you know how to implement a specific window algorithm show us how you would do that and explain what parts you want to be able to vary by making it a macro.

 

If you know how you want to be able to call the macro then show that and explain exactly what each parameter means. Include examples. 

Occasional Contributor
Posts: 14

Re: window operation by group

I already have a macro that gives me the moving average, the structure shown above is of what I have.

 

I am trying to learn what's an efficient way of putting it together for scaling. To Reeza's point self-contained macro or data step like below (which is not correct). I want to be able to loop through the variables (open, high, low and close) and have a combined "final" data. want1 - want4 are intermediate outputs that I get.

 

I am looking for a SAS code structure that is efficient for work like this with large dataset.

 

data final;
   set have;
   %moving_avg(have, want1, stock, date, open, 4, '2,3,4');
   %moving_avg(have, want2, stock, date, high, 4, '2,3,4');
   %moving_avg(have, want3, stock, date, low, 4, '2,3,4');
   %moving_avg(have, want4, stock, date, close, 4, '2,3,4');
run;

Thanks for your response Tom!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 23 replies
  • 173 views
  • 6 likes
  • 5 in conversation