turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- rolling standard deviation (proc expand)

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-18-2014 01:27 PM

Hi,

I have return data for different firms and need to calculate 36 month rolling standard deviation. I used the proc expand command as follows

PROC EXPAND DATA=a1 OUT=rollingstd;

CONVERT return=std / TRANSFORMOUT=(MOVSTD 35);

by permno;

RUN;

However I also need a variable to calculate the number of observations used to calculate a particular standard deviation like rolling_N because later on in my dataset I need to put this condition that if data for a particular company is less than one year (12 months) then for such companies i need to use the average standard deviation of all companies.

not sure how to calculate this rolling_N variable that lets me know how many observations were used in calculating the given standard deviation number from the proc expand command.

Many thanks for your help in advance.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Waqar

03-18-2014 01:35 PM

Wouldn't the N be 35 as you've specified?

And my assumption would be that less than 35 would be blank/missing.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

03-18-2014 01:39 PM

No I get a number for every observation except the first observation for each firm. Yes MOVSTD can be set to 36.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Waqar

03-18-2014 01:47 PM

Look into the options for dealing with missing values in the docs

You can truncate the length of the result series by using the TRIM, TRIMLEFT, and TRIMRIGHT operators to set values to missing at the beginning or end of the series.

You can use these functions to trim the results of moving time window operators so that the result series contains only values computed from a full width time window. For example, the following statements compute a centered five-period moving average of *X*, and they set to missing values at the ends of the series that are averages of fewer than five values.

convert x=y / transformout=( cmovave 5 trim 2 );

Normally, the moving time window and cumulative statistics operators ignore missing values and compute their results for the nonmissing values. When preceded by the NOMISS operator, these functions produce a missing result if any value within the time window is missing.

The NOMISS operator does not perform any calculations, but serves to modify the operation of the moving time window operator that follows it. The NOMISS operator has no effect unless it is followed by a moving time window operator.

For example, the following statement computes a five-period moving average of the variable *X* but produces a missing value when any of the five values are missing.

convert x=y / transformout=( nomiss movave 5 );

The following statement computes the cumulative sum of the variable *X* but produces a missing value for all periods after the first missing X value.

convert x=y / transformout=( nomiss cusum );

Similar to the NOMISS operator, the MISSONLY operator does not perform any calculations (unless followed by the MEAN option), but it serves to modify the operation of the moving time window operator that follows it. When preceded by the MISSONLY operator, these moving time window operators replace any missing values with the moving statistic and leave nonmissing values unchanged.

For example, the following statement replaces any missing values of the variable *X* with an exponentially weighted moving average of the past values of *X* and leaves nonmissing values unchanged. The missing values are interpolated using the specified exponentially weighted moving average. (This is also called simple exponential smoothing.)

convert x=y / transformout=( missonly ewma 0.3 );

The following statement replaces any missing values of the variable *X* with the overall mean of *X*.

convert x=y / transformout=( missonly mean );

You can use the SETMISS operator to replace missing values with a specified number. For example, the following statement replaces any missing values of the variable *X* with the number 8.77.

convert x=y / transformout=( setmiss 8.77 );

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

03-18-2014 02:27 PM

Thanks Reeza,

I had looked at it but does not serve my purpose. What I am looking for is another variable say "nrolling" that captures the number of observations used to calculate the standard deviation so what i expect is a variable that will increase in a series from 1 to 36 and stays at 36 for the rest of the data for that particular firm.

when i run proc expand command my data looks something like this

ID time return rollingstd

A 0 0.2 0

A 1 0.12 0.056

A 2 0.15 0.040

. . . .

. . . .

A 35 0.10 0.05

A 36 0.11 0.051

A 37 0.17 0.053

. . . .

. . . .

A 96 0.15 0.032

B 0 0.10 0

B 1 0.12 0.04

continue.

What i want is in addition to above data an additional variable which i expect to be something like this

ID time return rollingstd nrollingstd

A 0 0.2 0 0

A 1 0.12 0.056 1

A 2 0.15 0.040 2

. . . . .

. . . . .

A 35 0.10 0.05 35

A 36 0.11 0.051 36

A 37 0.17 0.053 36

. . . . 36

. . . . 36

A 96 0.15 0.032 36

B 0 0.10 0 0

B 1 0.12 0.04 1

I hope it clarifies things a bit.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Waqar

03-18-2014 03:05 PM

Waqar wrote:

However I also need a variable to calculate the number of observations used to calculate a particular standard deviation like rolling_N because later on in my dataset I need to put this condition that if data for a particular company is less than one year (12 months) then for such companies i need to use the average standard deviation of all companies.

If you used trim with 11 months wouldn't the ones that had less than one year be missing? Then you set them to the value you wanted.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

03-18-2014 03:24 PM

when you use the trim 11 month function not only the first 11 but also the last 11 observations become missing. It would be helpful if you see how your recommendation produces output on a sample data. i can send you some sample data if you like. Not sure how to attach it here. Also if I eliminate the values that have less than 12 observations then my means would be affected by this. I need to calculate means using all available data and replacement comes at a much later stage.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Waqar

03-18-2014 03:29 PM

Try trimleft or trimright instead (check the docs).

Consider adding a 1 to every record and then use a movingsum on the one column with the same window.