BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Antoine44
Fluorite | Level 6

Hi everyone,

 

For a particular variable (traded volume), I need to get the mean of the past 252 observations. Anyone can help me with that?  

 

Thanks!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Here is one way. I just created some madeup data

 

data have;
    do _RIC='A', 'B', 'C';
        do date='01jan2017'd to '23oct2019'd;
            buy_tradevolume=ceil(rand('uniform')*10000);
            output;
        end;
    end;
    format date ddmmyy10.;
run;

data want;
    array lag[0:251] _temporary_;          
    call missing(of lag[*]);          
    do _N_=1 by 1 until (last._RIC);         
        set have;
        by _RIC;
        lag[mod(_N_, 252)]=buy_tradevolume;                             
        avg=ifn(_N_ <= 252, ., mean(of lag[*]));
        output;
    end;
run;

View solution in original post

13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

What if there are not 252 past obs? Should the value be missing?

 

Do you have By-Groups in your data?

andreas_lds
Jade | Level 19

Can you show us some example input data and the expected result?

Antoine44
Fluorite | Level 6

Here's attached a picture of my dataset, it's a PDF, hope that's ok

 

It is grouped by _RIC so you can only see the first one and if a RIC doesn't previous 252 days then it is ignored.

 

Thanks 🙂 

PeterClemmensen
Tourmaline | Level 20

Here is one way. I just created some madeup data

 

data have;
    do _RIC='A', 'B', 'C';
        do date='01jan2017'd to '23oct2019'd;
            buy_tradevolume=ceil(rand('uniform')*10000);
            output;
        end;
    end;
    format date ddmmyy10.;
run;

data want;
    array lag[0:251] _temporary_;          
    call missing(of lag[*]);          
    do _N_=1 by 1 until (last._RIC);         
        set have;
        by _RIC;
        lag[mod(_N_, 252)]=buy_tradevolume;                             
        avg=ifn(_N_ <= 252, ., mean(of lag[*]));
        output;
    end;
run;
Antoine44
Fluorite | Level 6

Great, I don't understand a thing of what you wrote but it perfectly works! 🙂 

 

Thanks so much.

Kurt_Bremser
Super User

@Antoine44 wrote:

Great, I don't understand a thing of what you wrote but it perfectly works! 🙂 

 


Then you should not use it. It is essential to know what a piece of code you employ does, and how it does it.

Please tell us which parts you do not understand, and we can supply the necessary information.

  • do you know what an array in SAS is?
  • do you know what call missing() does?
  • do you know what "of lag[*]" means?
  • do you know what a do loop is, which types of it there are, and when it will stop?
  • do you know what the mod(), ifn(), and mean() functions do?
Antoine44
Fluorite | Level 6

I think I can answer no to all your questions.

Kurt_Bremser
Super User

@Antoine44 wrote:

I think I can answer no to all your questions.


Arrays are basic components in all programming languages. An array is a chain of equally typed variables, where the individual members can be addressed with an index. The variables in a temporary array will not be included in the output. Details of the array statement and handling of array are found here.

 

call missing() sets all the variables (columns) given as parameters to missing; you can mix datatypes in the parameters

 

of ....   means that a list of variables follows; in this case it is the whole array that will be set to missing. This is necessary here because temporary arrays keep their values from one data step iteration to another, and we do not want that here.

 

do loops come in several flavors: iterative, until, while; these flavors can be combined. Here an iterative counter (_N_) is used, but with no upper ("to") bound; the end of the loop comes when the last observation of a group is read (the "until" condition).

 

The mod() function performs the modulo operation: 1 will be 1, 2 will be 2, 251 will be 251, 252 will be zero, 253 will be 1, 254 will be 2 and so on. So you can see that you only get correct values for the addressing of array elements (which was defined 0:251).

 

ifn() is a function that returns a numeric value (there's also a ifc()), and it requires three parameters: a condition, a value to be returned when the condition is true, and a value to be returned when false. In this case, as long as our counter is less than 252, we return missing values.

 

mean() simply returns the arithmetic average of all parameter values, which are built from the whole array once again.

 

 

The code fills the array in a round-robin fashion, and starts to return averages as soon as the whole array is filled with values.

Antoine44
Fluorite | Level 6

Thanks for that, that helps. So the code I got previously does work, however ti actually doesn't do precisely what I was after.

 

for example, I get this:

 

n       total_buyvolume      avg

 

1        38224.373                 .

2        50590.5011               .

.

.

.

252    292598.3558             .

253    550276.994          3078902.8083

 

the problem here is that the avg figure on observation 253 is from observation 2 to observation 253. However, I'd like the avg figure on observation 253 it to be from observation 1 to observation 252.
Meaning that I'd like to have the previous 252 days on a given day without including the figure on that particular day. Does it make sense? How do I do that? 

PeterClemmensen
Tourmaline | Level 20

Do like this

 

data want;
    array lag[0:251] _temporary_;          
    call missing(of lag[*]);     
    do _N_=0 by 1 until (last._RIC);         
        set have;
        by _RIC;                         
        avg=ifn(_N_ < 252, ., mean(of lag[*]));
        output;
        lag[mod(_N_, 252)]=buy_tradevolume;  
    end;
run;
Antoine44
Fluorite | Level 6

yep that works.

 

Thanks so much!

Kurt_Bremser
Super User

Do NOT post data in pictures. Do you really expect we start typing it off the picture row by row, column by column just so we have something for testing?

Please post data in a data step with datalines, or use a loop like @PeterClemmensen, so we can easily create the dataset with a copy/paste and run.

If you are really interested in getting help, invest the time and help us to help you.

Antoine44
Fluorite | Level 6

Sorry, will do next time.

 

Thanks.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 2091 views
  • 2 likes
  • 4 in conversation