Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Re: Yearly standard deviation based on quarterly data

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-27-2018 11:21 AM
(2978 views)

Hello,

I want to calculate the standard deviation (std) of return on assets (roaq) and cash flows (cfq) based on quarterly data. Assume that below is dataset (my data have about 700,000 observations), the calculation of the standard deviation of ROA and CF for the year 2015 will use data from 2011 to 2015 (5 years * 4 quarters), std of ROA and CF for the year 2016 will use data from 2012 to 2016.

I need only yearly std based on 5-year quarterly data. I dont need quarterly std.

GVKEY | DATADATE | FYEARQ | FQTR | roaq | CFQ |

1004 | 20110831 | 2011 | 1 | 0.00947287 | 0.016586 |

1004 | 20111130 | 2011 | 2 | 0.00965134 | 0.016939 |

1004 | 20120229 | 2011 | 3 | 0.00930643 | 0.016289 |

1004 | 20120531 | 2011 | 4 | 0.00586568 | 0.023345 |

1004 | 20120831 | 2012 | 1 | 0.00838594 | 0.018386 |

1004 | 20121130 | 2012 | 2 | 0.00813045 | 0.018555 |

1004 | 20130228 | 2012 | 3 | 0.0082995 | 0.013811 |

1004 | 20130531 | 2012 | 4 | 0.00028078 | 0.025198 |

1004 | 20130831 | 2013 | 1 | 0.00840691 | 0.019506 |

1004 | 20131130 | 2013 | 2 | 0.00811726 | 0.019542 |

1004 | 20140228 | 2013 | 3 | 0.00763635 | 0.018812 |

1004 | 20140531 | 2013 | 4 | 0.00677427 | 0.017701 |

1004 | 20140831 | 2014 | 1 | 0.0053505 | 0.012474 |

1004 | 20141130 | 2014 | 2 | 0.00623962 | 0.012711 |

1004 | 20150228 | 2014 | 3 | 0.00089043 | 0.007927 |

1004 | 20150531 | 2014 | 4 | -0.0541914 | -0.00776 |

1004 | 20150831 | 2015 | 1 | 0.00478283 | 0.014969 |

1004 | 20151130 | 2015 | 2 | 0.00599505 | 0.015947 |

1004 | 20160229 | 2015 | 3 | 0.0065046 | 0.016882 |

1004 | 20160531 | 2015 | 4 | 0.00818251 | 0.018592 |

When I calculated std based on annual data, I used the following macro, which works well (where firmyear = gvkey+fyear). Is there anyway to modify this macro so that it can use quarterly data rather than yearly data. I prefer this macro as it is efficient.

%macro roastdev(dsin=funda, dsout=funda);

proc sql;

create table comp_roa1 as

select a.firmyear, b.roa, b.firmyear as prev_firmyear

from

&dsin a, &dsin b

where

a.gvkey = b.gvkey

and a.fyear - 4 <= b.fyear <= a.fyear;

quit;

proc sort data = comp_roa1; by firmyear;run;

/* compute stddev */

proc means data=comp_roa1 noprint;

output out=comp_roa2 std=/autoname;

var roa;

by firmyear ;

run;

/* dataset with standard deviation of roa: firmyear, roa_stddev require at least 4 obs

*/

data comp_roa2;

set comp_roa2;

if _freq_ >= 4;

run;

/* create output dataset */

proc sql;

create table &dsout as

select a.*, b.roa_stddev from &dsin a left join comp_roa2 b on a.firmyear = b.firmyear;

quit;

%mend;

%roastdev(dsin=work.funda, dsout=funda);

proc datasets library=work; delete comp_roa1 comp_roa2; run;

Thank you very much!

Best regards, Thierry

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am using SAS 9.4 TS1M3 and do not get that error, so I assume you are using an older version.

You can change the temporary array to a normal array of variables, and retain those variables:

```
data v_have_template / view=v_have_template;
set have (keep=gvkey fyearq);
by gvkey fyearq;
if first.fyearq then do fqtr=1 to 4;
output;
end;
run;
data want (drop=_:);
merge have v_have_template;
by gvkey fyearq fqtr;
array _roa{0:19} _roa0-_roa19; /* Rolling 20 qtrs of data */
retain _roa: ;
if first.gvkey then call missing(of _roa{*});
_roa{mod(4*fyearq+fqtr,20)}=roaq; /* Populate array */
if fqtr=4; /* Process only Q4 records */
n_roaq=n(of _roa{*}); /* Count non-missing roaq values */
if n_roaq >=16 then std_roaq=std(of _roa{*}); /* Editted: changed 4 to 16 */
run;
```

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

20 REPLIES 20

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

That code doesn't calculate a yearly average, it calculates a 5 year moving average.

What are you defining as yearly standard deviation? How are you expecting to use it?

Aggregation often suppresses variability, ie smooths out curves, so it's possible you're understating the standard deviation.

If you it's how much does the measure vary over a year, you'll first need to aggregate your data to an annual value somehow. Given no context for the data it's hard to recommend an appropriate method. If it was sales data it may be perfectly valid to sum the values, in other cases you should have a weighted average.

@Thierrynguyen wrote:

Hello,

I want to calculate the standard deviation (std) of return on assets (roaq) and cash flows (cfq) based on quarterly data. Assume that below is dataset (my data have about 700,000 observations), the calculation of the standard deviation of ROA and CF for the year 2015 will use data from 2011 to 2015 (5 years * 4 quarters), std of ROA and CF for the year 2016 will use data from 2012 to 2016.

I need only yearly std based on 5-year quarterly data. I dont need quarterly std.

GVKEY DATADATE FYEARQ FQTR roaq CFQ 1004 20110831 2011 1 0.00947287 0.016586 1004 20111130 2011 2 0.00965134 0.016939 1004 20120229 2011 3 0.00930643 0.016289 1004 20120531 2011 4 0.00586568 0.023345 1004 20120831 2012 1 0.00838594 0.018386 1004 20121130 2012 2 0.00813045 0.018555 1004 20130228 2012 3 0.0082995 0.013811 1004 20130531 2012 4 0.00028078 0.025198 1004 20130831 2013 1 0.00840691 0.019506 1004 20131130 2013 2 0.00811726 0.019542 1004 20140228 2013 3 0.00763635 0.018812 1004 20140531 2013 4 0.00677427 0.017701 1004 20140831 2014 1 0.0053505 0.012474 1004 20141130 2014 2 0.00623962 0.012711 1004 20150228 2014 3 0.00089043 0.007927 1004 20150531 2014 4 -0.0541914 -0.00776 1004 20150831 2015 1 0.00478283 0.014969 1004 20151130 2015 2 0.00599505 0.015947 1004 20160229 2015 3 0.0065046 0.016882 1004 20160531 2015 4 0.00818251 0.018592

When I calculated std based on annual data, I used the following macro, which works well (where firmyear = gvkey+fyear). Is there anyway to modify this macro so that it can use quarterly data rather than yearly data. I prefer this macro as it is efficient.

%macro roastdev(dsin=funda, dsout=funda);

proc sql;

create table comp_roa1 as

select a.firmyear, b.roa, b.firmyear as prev_firmyear

from

&dsin a, &dsin b

where

a.gvkey = b.gvkey

and a.fyear - 4 <= b.fyear <= a.fyear;

quit;

proc sort data = comp_roa1; by firmyear;run;

/* compute stddev */

proc means data=comp_roa1 noprint;

output out=comp_roa2 std=/autoname;

var roa;

by firmyear ;

run;

/* dataset with standard deviation of roa: firmyear, roa_stddev require at least 4 obs

*/

data comp_roa2;

set comp_roa2;

if _freq_ >= 4;

run;

/* create output dataset */

proc sql;

create table &dsout as

select a.*, b.roa_stddev from &dsin a left join comp_roa2 b on a.firmyear = b.firmyear;

quit;

%mend;

%roastdev(dsin=work.funda, dsout=funda);

proc datasets library=work; delete comp_roa1 comp_roa2; run;

Thank you very much!

Best regards, Thierry

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you @Reeza. You are right, the macro code is to calculate moving std from year t-4 to year t. In other words, the calculation of std in year t requires data from year t-4 to t. I checked the and it works well.

For my question, I need to calculate std in year t (or I named it yearly std), which is defined as the standard deviation of 20 quarters from Q1 in year t-4 to Q4 in year t. I use this variable as a measure of risk.

Best regards,

Thierry

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Also, I do not require 20 quarters to calculate std, say I require 16 quarters data or similar.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I am confused with your further correspondence than your initial question

You mentioned - "I need only yearly std based on 5-year quarterly data. I dont need quarterly std."

```
data have;
input GVKEY DATADATE : yymmdd10. FYEARQ FQTR roaq CFQ;
format DATADATE yymmdd10.;
cards;
1004 20110831 2011 1 0.00947287 0.016586
1004 20111130 2011 2 0.00965134 0.016939
1004 20120229 2011 3 0.00930643 0.016289
1004 20120531 2011 4 0.00586568 0.023345
1004 20120831 2012 1 0.00838594 0.018386
1004 20121130 2012 2 0.00813045 0.018555
1004 20130228 2012 3 0.0082995 0.013811
1004 20130531 2012 4 0.00028078 0.025198
1004 20130831 2013 1 0.00840691 0.019506
1004 20131130 2013 2 0.00811726 0.019542
1004 20140228 2013 3 0.00763635 0.018812
1004 20140531 2013 4 0.00677427 0.017701
1004 20140831 2014 1 0.0053505 0.012474
1004 20141130 2014 2 0.00623962 0.012711
1004 20150228 2014 3 0.00089043 0.007927
1004 20150531 2014 4 -0.0541914 -0.00776
1004 20150831 2015 1 0.00478283 0.014969
1004 20151130 2015 2 0.00599505 0.015947
1004 20160229 2015 3 0.0065046 0.016882
1004 20160531 2015 4 0.00818251 0.018592
;
/*grouping every 5 years grp 1 aka 2011-2015*/
data temp;
set have;
by GVKEY FYEARQ;
if first.gvkey then do;n=0;grp=1;end;
if FYEARQ ne lag(FYEARQ) then n+1;
if n>5 then do; grp+1;n=1;end;
run;
proc means data =temp nway noprint;
class gvkey grp;
var cfq;
output out=want(drop=_:) std=/ autoname;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Ok, so now you just want the moving average within a 5 year period as long as you have more than 4 years (or 16 quarters of data) and don't care about 'yearly' measures? Does it matter where missing data exists?

Do you have a SAS/ETS licensed? If you do, PROC EXPAND is a good option though not sure it can handle the 16 quarters of data requirement.

Otherwise, modifying your macro would be just modifying the where filters. Have you tried that?

@Thierrynguyen wrote:

Thank you @Reeza. You are right, the macro code is to calculate moving std from year t-4 to year t. In other words, the calculation of std in year t requires data from year t-4 to t. I checked the and it works well.

For my question, I need to calculate std in year t (or I named it yearly std), which is defined as the standard deviation of 20 quarters from Q1 in year t-4 to Q4 in year t. I use this variable as a measure of risk.

Best regards,

Thierry

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Reeza May i request a favor in educating me a tiny bit on the moving average thing pertaining to the dataset plz using like 3-4 records whenever you have time. I am intrigued with zero knowledge at this point although I tend to enroll for a course Time series and forecasting next quarter at my university.

Not bothering for expansive details. just a pointer with a dataset and what it means will do. I will dig in later with those inputs. Cheers!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@novinosrin there are a few methods to deal with moving averages.

It would be in order of preference:

1. Proc expand with time series -> most efficient but requires ETS license. Unfortunately also requires a complete data set, so no missing time intervals.

2. Temporary Arrays -> I like this approach because it's flexible in the size of the lags required and is relatively easy to program. But changing to a 2 or 3 year lag is easy.

I think Dow loops are possible but to be honest, I don't use them very often and have to look up the code every time I need it, which means I don't use it often and I never remember it 🙂

This is a decent write up and examples of the various options:

http://sasnrd.com/moving-average/

@Thierrynguyen the examples below or in the link above offer a few alternatives that will work for you.

```
proc sort data=sashelp.stocks out=stocks;
by stock date;
run;
data want;
array p{0:3} _temporary_;
set stocks;
by stock;
if first.stock then call missing(of p{*});
p{mod(_n_,4)} = open;
*if you want to have a certain number of records before you calculate statistics;
if n(of p{*})>2 then do;
lowest = min(of p{*});
highest = max(of p{*});
end;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Reeza Thank you. So in essence, does moving avg mean--> avg(x,lag(x)) ? regardless of technology we use

EDIT: Got it. Thank you. I just needed to get my mind to grip with some jargon .

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In general yes, but the definition can vary a bit. For example it could be a leading window or a window statistic or rolling average. The terms are kind of used interchangeably and you need to basically check what a person means when they use it :S.

I'm finding that more and more as 'data science' brings in computer science terminology rather than statistical terminology.

@novinosrin wrote:

@Reeza Thank you. So in essence, does moving avg mean--> avg(x,lag(x)) ? regardless of technology we use

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You want yearly std of quafterly data for each gvkey/fiscal year.

```
proc summary data=have nway ;
output out=want (drop=_type_) std(roaq cfq)= /autoname;
class gvkey fyear;
run;
```

This is probably more efficient than your proc sql approach, although unlikely to be as efficient as a data step - given that your Compustat data are almost certainly sorted by gvkey/fyear/fqtr.

The will produce variables _freq_, STD_roaq and STD_cfq. If you want to eliminate fiscal years with less than four quarters, you could do this:

```
proc summary data=have nway ;
output out=want (drop=_type_ where=(_freq_=4)) std(roaq cfq)= /autoname;
class gvkey fyear;
run;
```

The advantage of this approach is that proc summary offers many other statistics in addition to std.

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you very much! Unfortunately, the codes offered by you do not solve my issue. @mkeintz provided a code to calculate yearly std based on 4 quarter data (from here, it is sure that I can calculate the moving average of std for 5 years). However, it is not really what I want. I need to calculate std based on 20 quarter data, or 5 years, and require at least non-missing 16 quarters (randomly is ok). @Reeza and @mkeintz do you have any other ideas?

Thank you all very much! Especially, thanks to @Reeza as you have proved that you are an outstanding expert in SAS (you provided several solutions for me so far).

Best regards,

Tri

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Reeza, yes I can use proc expand.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

OK, now I understand that you want rolling std's for quarterly data spanning 5 years, with a minimum requirement of 16 quarters of data.

Questions:

- Do you want rolling quarterly std's, or just the std's for period ending in Q4?
- When a quarter is missing, do you mean that the entire record for that GVKEY/FYEARQ/FQTR is missing, or do you mean that the record is present, but the variable of interest has a missing value?

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for

Allow PROC SORT to output multiple datasets

--------------------------

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks, here is more explanation:

1/ Yes I only want for period ending in Q4, assuming that Q4 is ending of fiscal year.

2/ not clear what is your question. But I mean that if firms released, e.g., only 16 out of 20 quarterly reports in 5 years (from year t-4 to year t), we still have data to calculate standard deviations. Hence, there is non-missing for Q4 of year t. Is that clear?

Thanks,

Best regards,

Thierry

1/ Yes I only want for period ending in Q4, assuming that Q4 is ending of fiscal year.

2/ not clear what is your question. But I mean that if firms released, e.g., only 16 out of 20 quarterly reports in 5 years (from year t-4 to year t), we still have data to calculate standard deviations. Hence, there is non-missing for Q4 of year t. Is that clear?

Thanks,

Best regards,

Thierry

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Ready to level-up your skills? Choose your own adventure.