Hi everyone,
For a particular variable (traded volume), I need to get the mean of the past 252 observations. Anyone can help me with that?
Thanks!
Here is one way. I just created some madeup data
data have;
do _RIC='A', 'B', 'C';
do date='01jan2017'd to '23oct2019'd;
buy_tradevolume=ceil(rand('uniform')*10000);
output;
end;
end;
format date ddmmyy10.;
run;
data want;
array lag[0:251] _temporary_;
call missing(of lag[*]);
do _N_=1 by 1 until (last._RIC);
set have;
by _RIC;
lag[mod(_N_, 252)]=buy_tradevolume;
avg=ifn(_N_ <= 252, ., mean(of lag[*]));
output;
end;
run;
What if there are not 252 past obs? Should the value be missing?
Do you have By-Groups in your data?
Can you show us some example input data and the expected result?
Here's attached a picture of my dataset, it's a PDF, hope that's ok
It is grouped by _RIC so you can only see the first one and if a RIC doesn't previous 252 days then it is ignored.
Thanks 🙂
Here is one way. I just created some madeup data
data have;
do _RIC='A', 'B', 'C';
do date='01jan2017'd to '23oct2019'd;
buy_tradevolume=ceil(rand('uniform')*10000);
output;
end;
end;
format date ddmmyy10.;
run;
data want;
array lag[0:251] _temporary_;
call missing(of lag[*]);
do _N_=1 by 1 until (last._RIC);
set have;
by _RIC;
lag[mod(_N_, 252)]=buy_tradevolume;
avg=ifn(_N_ <= 252, ., mean(of lag[*]));
output;
end;
run;
Great, I don't understand a thing of what you wrote but it perfectly works! 🙂
Thanks so much.
@Antoine44 wrote:
Great, I don't understand a thing of what you wrote but it perfectly works! 🙂
Then you should not use it. It is essential to know what a piece of code you employ does, and how it does it.
Please tell us which parts you do not understand, and we can supply the necessary information.
I think I can answer no to all your questions.
@Antoine44 wrote:
I think I can answer no to all your questions.
Arrays are basic components in all programming languages. An array is a chain of equally typed variables, where the individual members can be addressed with an index. The variables in a temporary array will not be included in the output. Details of the array statement and handling of array are found here.
call missing() sets all the variables (columns) given as parameters to missing; you can mix datatypes in the parameters
of .... means that a list of variables follows; in this case it is the whole array that will be set to missing. This is necessary here because temporary arrays keep their values from one data step iteration to another, and we do not want that here.
do loops come in several flavors: iterative, until, while; these flavors can be combined. Here an iterative counter (_N_) is used, but with no upper ("to") bound; the end of the loop comes when the last observation of a group is read (the "until" condition).
The mod() function performs the modulo operation: 1 will be 1, 2 will be 2, 251 will be 251, 252 will be zero, 253 will be 1, 254 will be 2 and so on. So you can see that you only get correct values for the addressing of array elements (which was defined 0:251).
ifn() is a function that returns a numeric value (there's also a ifc()), and it requires three parameters: a condition, a value to be returned when the condition is true, and a value to be returned when false. In this case, as long as our counter is less than 252, we return missing values.
mean() simply returns the arithmetic average of all parameter values, which are built from the whole array once again.
The code fills the array in a round-robin fashion, and starts to return averages as soon as the whole array is filled with values.
Thanks for that, that helps. So the code I got previously does work, however ti actually doesn't do precisely what I was after.
for example, I get this:
n total_buyvolume avg
1 38224.373 .
2 50590.5011 .
.
.
.
252 292598.3558 .
253 550276.994 3078902.8083
the problem here is that the avg figure on observation 253 is from observation 2 to observation 253. However, I'd like the avg figure on observation 253 it to be from observation 1 to observation 252.
Meaning that I'd like to have the previous 252 days on a given day without including the figure on that particular day. Does it make sense? How do I do that?
Do like this
data want;
array lag[0:251] _temporary_;
call missing(of lag[*]);
do _N_=0 by 1 until (last._RIC);
set have;
by _RIC;
avg=ifn(_N_ < 252, ., mean(of lag[*]));
output;
lag[mod(_N_, 252)]=buy_tradevolume;
end;
run;
yep that works.
Thanks so much!
Do NOT post data in pictures. Do you really expect we start typing it off the picture row by row, column by column just so we have something for testing?
Please post data in a data step with datalines, or use a loop like @PeterClemmensen, so we can easily create the dataset with a copy/paste and run.
If you are really interested in getting help, invest the time and help us to help you.
Sorry, will do next time.
Thanks.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.