BookmarkSubscribeRSS Feed
Waqar
Calcite | Level 5

Hi,

I have return data for different firms and need to calculate 36 month rolling standard deviation. I used the proc expand command as follows

PROC EXPAND DATA=a1 OUT=rollingstd;

CONVERT return=std / TRANSFORMOUT=(MOVSTD 35);

by permno;

RUN;

However I also need a variable to calculate the number of observations used to calculate a particular standard deviation like rolling_N because later on in my dataset I need to put this condition that if data for a particular company is less than one year (12 months) then for such companies i need to use the average standard deviation of all companies.

not sure how to calculate this rolling_N variable that lets me know how many observations were used in calculating the given standard deviation number from the proc expand command.

Many thanks for your help in advance.

7 REPLIES 7
Reeza
Super User

Wouldn't the N be 35 as you've specified?

And my assumption would be that less than 35 would be blank/missing.

Waqar
Calcite | Level 5

No I get a number for every observation except the first observation for each firm. Yes MOVSTD can be set to 36.

Reeza
Super User

Look into the options for dealing with missing values in the docs

SAS/ETS(R) 9.2 User's Guide

You can truncate the length of the result series by using the TRIM, TRIMLEFT, and TRIMRIGHT operators to set values to missing at the beginning or end of the series.

You can use these functions to trim the results of moving time window operators so that the result series contains only values computed from a full width time window. For example, the following statements compute a centered five-period moving average of X, and they set to missing values at the ends of the series that are averages of fewer than five values.

   convert x=y / transformout=( cmovave 5 trim 2 ); 

Normally, the moving time window and cumulative statistics operators ignore missing values and compute their results for the nonmissing values. When preceded by the NOMISS operator, these functions produce a missing result if any value within the time window is missing.

The NOMISS operator does not perform any calculations, but serves to modify the operation of the moving time window operator that follows it. The NOMISS operator has no effect unless it is followed by a moving time window operator.

For example, the following statement computes a five-period moving average of the variable X but produces a missing value when any of the five values are missing.

   convert x=y / transformout=( nomiss movave 5 ); 

The following statement computes the cumulative sum of the variable X but produces a missing value for all periods after the first missing X value.

   convert x=y / transformout=( nomiss cusum ); 

Similar to the NOMISS operator, the MISSONLY operator does not perform any calculations (unless followed by the MEAN option), but it serves to modify the operation of the moving time window operator that follows it. When preceded by the MISSONLY operator, these moving time window operators replace any missing values with the moving statistic and leave nonmissing values unchanged.

For example, the following statement replaces any missing values of the variable X with an exponentially weighted moving average of the past values of X and leaves nonmissing values unchanged. The missing values are interpolated using the specified exponentially weighted moving average. (This is also called simple exponential smoothing.)

   convert x=y / transformout=( missonly ewma 0.3 ); 

The following statement replaces any missing values of the variable X with the overall mean of X.

   convert x=y / transformout=( missonly mean ); 

You can use the SETMISS operator to replace missing values with a specified number. For example, the following statement replaces any missing values of the variable X with the number 8.77.

   convert x=y / transformout=( setmiss 8.77 ); 
Waqar
Calcite | Level 5

Thanks Reeza,

I had looked at it but does not serve my purpose. What I am looking for is another variable say "nrolling" that captures the number of observations used to calculate the standard deviation so what i expect is a variable that will increase in a series from 1 to 36 and stays at 36 for the rest of the data for that particular firm.

when i run proc expand command my data looks something like this

ID     time      return      rollingstd

A     0               0.2          0    

A     1               0.12       0.056

A     2               0.15       0.040

.      .                   .               .

.      .                  .               .

A     35             0.10         0.05

A     36             0.11        0.051    

A     37             0.17        0.053 

.      .                   .               .

.      .                  .               .

A     96              0.15        0.032

B     0               0.10            0

B     1               0.12          0.04

continue.

What i want is in addition to above data an additional variable which i expect to be something like this

ID     time      return      rollingstd          nrollingstd

A     0               0.2          0                         0    

A     1               0.12       0.056                    1    

A     2               0.15       0.040                    2

.      .                   .               .                     .

.      .                  .               .                      .

A     35             0.10         0.05                    35

A     36             0.11        0.051                    36    

A     37             0.17        0.053                    36              

.      .                   .               .                      36

.      .                  .               .                       36                        

A     96              0.15        0.032                   36

B     0               0.10            0                      0

B     1               0.12          0.04                    1

I hope it clarifies things a bit.

Reeza
Super User

Waqar wrote:

However I also need a variable to calculate the number of observations used to calculate a particular standard deviation like rolling_N because later on in my dataset I need to put this condition that if data for a particular company is less than one year (12 months) then for such companies i need to use the average standard deviation of all companies.

If you used trim with 11 months wouldn't the ones that had less than one year be missing? Then you set them to the value you wanted.

Waqar
Calcite | Level 5

when you use the trim 11 month function not only the first 11 but also the last 11 observations become missing. It would be helpful if you see how your recommendation produces output on a sample data. i can send you some sample data if you like. Not sure how to attach it here. Also if I eliminate the values that have less than 12 observations then my means would be affected by this. I need to calculate means using all available data and replacement comes at a much later stage.

Reeza
Super User

Try trimleft or trimright instead (check the docs).

Consider adding a 1 to every record and then use a movingsum on the one column with the same window.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 11097 views
  • 0 likes
  • 2 in conversation