Hi,
I have return data for different firms and need to calculate 36 month rolling standard deviation. I used the proc expand command as follows
PROC EXPAND DATA=a1 OUT=rollingstd;
CONVERT return=std / TRANSFORMOUT=(MOVSTD 35);
by permno;
RUN;
However I also need a variable to calculate the number of observations used to calculate a particular standard deviation like rolling_N because later on in my dataset I need to put this condition that if data for a particular company is less than one year (12 months) then for such companies i need to use the average standard deviation of all companies.
not sure how to calculate this rolling_N variable that lets me know how many observations were used in calculating the given standard deviation number from the proc expand command.
Many thanks for your help in advance.
Wouldn't the N be 35 as you've specified?
And my assumption would be that less than 35 would be blank/missing.
No I get a number for every observation except the first observation for each firm. Yes MOVSTD can be set to 36.
Look into the options for dealing with missing values in the docs
You can truncate the length of the result series by using the TRIM, TRIMLEFT, and TRIMRIGHT operators to set values to missing at the beginning or end of the series.
You can use these functions to trim the results of moving time window operators so that the result series contains only values computed from a full width time window. For example, the following statements compute a centered five-period moving average of X, and they set to missing values at the ends of the series that are averages of fewer than five values.
convert x=y / transformout=( cmovave 5 trim 2 );
Normally, the moving time window and cumulative statistics operators ignore missing values and compute their results for the nonmissing values. When preceded by the NOMISS operator, these functions produce a missing result if any value within the time window is missing.
The NOMISS operator does not perform any calculations, but serves to modify the operation of the moving time window operator that follows it. The NOMISS operator has no effect unless it is followed by a moving time window operator.
For example, the following statement computes a five-period moving average of the variable X but produces a missing value when any of the five values are missing.
convert x=y / transformout=( nomiss movave 5 );
The following statement computes the cumulative sum of the variable X but produces a missing value for all periods after the first missing X value.
convert x=y / transformout=( nomiss cusum );
Similar to the NOMISS operator, the MISSONLY operator does not perform any calculations (unless followed by the MEAN option), but it serves to modify the operation of the moving time window operator that follows it. When preceded by the MISSONLY operator, these moving time window operators replace any missing values with the moving statistic and leave nonmissing values unchanged.
For example, the following statement replaces any missing values of the variable X with an exponentially weighted moving average of the past values of X and leaves nonmissing values unchanged. The missing values are interpolated using the specified exponentially weighted moving average. (This is also called simple exponential smoothing.)
convert x=y / transformout=( missonly ewma 0.3 );
The following statement replaces any missing values of the variable X with the overall mean of X.
convert x=y / transformout=( missonly mean );
You can use the SETMISS operator to replace missing values with a specified number. For example, the following statement replaces any missing values of the variable X with the number 8.77.
convert x=y / transformout=( setmiss 8.77 );
Thanks Reeza,
I had looked at it but does not serve my purpose. What I am looking for is another variable say "nrolling" that captures the number of observations used to calculate the standard deviation so what i expect is a variable that will increase in a series from 1 to 36 and stays at 36 for the rest of the data for that particular firm.
when i run proc expand command my data looks something like this
ID time return rollingstd
A 0 0.2 0
A 1 0.12 0.056
A 2 0.15 0.040
. . . .
. . . .
A 35 0.10 0.05
A 36 0.11 0.051
A 37 0.17 0.053
. . . .
. . . .
A 96 0.15 0.032
B 0 0.10 0
B 1 0.12 0.04
continue.
What i want is in addition to above data an additional variable which i expect to be something like this
ID time return rollingstd nrollingstd
A 0 0.2 0 0
A 1 0.12 0.056 1
A 2 0.15 0.040 2
. . . . .
. . . . .
A 35 0.10 0.05 35
A 36 0.11 0.051 36
A 37 0.17 0.053 36
. . . . 36
. . . . 36
A 96 0.15 0.032 36
B 0 0.10 0 0
B 1 0.12 0.04 1
I hope it clarifies things a bit.
Waqar wrote:
However I also need a variable to calculate the number of observations used to calculate a particular standard deviation like rolling_N because later on in my dataset I need to put this condition that if data for a particular company is less than one year (12 months) then for such companies i need to use the average standard deviation of all companies.
If you used trim with 11 months wouldn't the ones that had less than one year be missing? Then you set them to the value you wanted.
when you use the trim 11 month function not only the first 11 but also the last 11 observations become missing. It would be helpful if you see how your recommendation produces output on a sample data. i can send you some sample data if you like. Not sure how to attach it here. Also if I eliminate the values that have less than 12 observations then my means would be affected by this. I need to calculate means using all available data and replacement comes at a much later stage.
Try trimleft or trimright instead (check the docs).
Consider adding a 1 to every record and then use a movingsum on the one column with the same window.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.