BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Thierrynguyen
Calcite | Level 5

Hello,

 

I want to calculate the standard deviation (std) of return on assets (roaq) and cash flows (cfq) based on quarterly data. Assume that below is dataset (my data have about 700,000 observations), the calculation of the standard deviation of ROA and CF for the year 2015 will use data from 2011 to 2015 (5 years * 4 quarters), std of ROA and CF for the year 2016 will use data from 2012 to 2016. 

 

I need only yearly std based on 5-year quarterly data. I dont need quarterly std.

 

GVKEYDATADATEFYEARQFQTRroaqCFQ
100420110831201110.009472870.016586
100420111130201120.009651340.016939
100420120229201130.009306430.016289
100420120531201140.005865680.023345
100420120831201210.008385940.018386
100420121130201220.008130450.018555
100420130228201230.00829950.013811
100420130531201240.000280780.025198
100420130831201310.008406910.019506
100420131130201320.008117260.019542
100420140228201330.007636350.018812
100420140531201340.006774270.017701
100420140831201410.00535050.012474
100420141130201420.006239620.012711
100420150228201430.000890430.007927
10042015053120144-0.0541914-0.00776
100420150831201510.004782830.014969
100420151130201520.005995050.015947
100420160229201530.00650460.016882
100420160531201540.008182510.018592

 

When I calculated std based on annual data, I used the following macro, which works well (where firmyear = gvkey+fyear). Is there anyway to modify this macro so that it can use quarterly data rather than yearly data. I prefer this macro as it is efficient. 

                                                                                                                                                                                         

                                                                                                                                                                                        

%macro roastdev(dsin=funda, dsout=funda);                                                                                                                                                     

                                                                                                                                                                                                        

proc sql;                                                                                                                                                                                               

  create table comp_roa1 as                                                                                                                                                                            

  select a.firmyear, b.roa, b.firmyear as prev_firmyear                                                                                                                                                

  from                                                                                                                                                                                                  

    &dsin a, &dsin b                                                                                                                                                                                    

  where                                                                                                                                                                                                

    a.gvkey = b.gvkey                                                                                                                                                                                  

    and a.fyear - 4 <= b.fyear <= a.fyear;                                                                                                                         

quit;                                                                                                                                                                                                   

                                                                                                                                                                                                        

proc sort data = comp_roa1; by firmyear;run;                                                                                                                                                           

                                                                                                                                                                                                        

/* compute stddev */                                                                                                                                                                                    

proc means data=comp_roa1  noprint;                                                                                                                                                                     

output out=comp_roa2 std=/autoname;                                                                                                                                                                     

var roa;                                                                                                                                                                                               

by firmyear ;                                                                                                                                                                                           

run;                                                                                                                                                                                                    

                                                                                                                                                                                                        

/* dataset with standard deviation of roa:  firmyear, roa_stddev require at least 4 obs                                                                                                                

*/                                                                                                                                                                                                     

data comp_roa2;                                                                                                                                                                                         

set comp_roa2;                                                                                                                                                                                          

if _freq_ >= 4;                                                                                                                                                                                         

run;                                                                                                                                                                                                   

                                                                                                                                                                                                        

/* create output dataset */                                                                                                                                                                             

proc sql;                                                                                                                                                                                               

  create table &dsout as                                                                                                                                                                                

  select a.*, b.roa_stddev from &dsin a left join comp_roa2 b on a.firmyear = b.firmyear;                                                                                                              

quit;                                                                                                                                                                                                   

                                                                                                                                                                                                        

%mend;                                                                                                                                                                                                  

                                                                                                                                                                                                       

%roastdev(dsin=work.funda, dsout=funda);                                                                                                                                                     

                                                                                                                                                                                        

proc datasets library=work; delete comp_roa1 comp_roa2; run; 

 

Thank you very much!

 

Best regards, Thierry                                                                                                                                          

                                                                                                                                                                                                        

                                                                                                                                                                                                        

                                                                                                                                                                                                        

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I am using   SAS 9.4 TS1M3 and do not get that error, so I assume you are using an older version.

 

You can change the temporary array to a normal array of variables, and retain those variables:

 

data v_have_template / view=v_have_template;
  set have (keep=gvkey fyearq);
  by gvkey fyearq;
  if first.fyearq then do fqtr=1 to 4;
    output;
  end;
run;

data want (drop=_:);
  merge have v_have_template;
  by gvkey fyearq fqtr;

  array _roa{0:19} _roa0-_roa19;  /* Rolling 20 qtrs of data */
  retain _roa: ;
  if first.gvkey then call missing(of _roa{*});

  _roa{mod(4*fyearq+fqtr,20)}=roaq; /* Populate array */

  if fqtr=4;   /* Process only Q4 records */

  n_roaq=n(of _roa{*});  /* Count non-missing roaq values */
  if n_roaq >=16 then std_roaq=std(of _roa{*});  /* Editted: changed 4 to 16 */
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

20 REPLIES 20
Reeza
Super User

That code doesn't calculate a yearly average, it calculates a 5 year moving average. 

 

What are you defining as yearly standard deviation? How are you expecting to use it?

 

Aggregation often suppresses variability, ie smooths out curves, so it's possible you're understating the standard deviation.

 

If you it's how much does the measure vary over a year, you'll first need to aggregate your data to an annual value somehow. Given no context for the data it's hard to recommend an appropriate method. If it was sales data it may be perfectly valid to sum the values, in other cases you should have a weighted average. 

 


@Thierrynguyen wrote:

Hello,

 

I want to calculate the standard deviation (std) of return on assets (roaq) and cash flows (cfq) based on quarterly data. Assume that below is dataset (my data have about 700,000 observations), the calculation of the standard deviation of ROA and CF for the year 2015 will use data from 2011 to 2015 (5 years * 4 quarters), std of ROA and CF for the year 2016 will use data from 2012 to 2016. 

 

I need only yearly std based on 5-year quarterly data. I dont need quarterly std.

 

GVKEY DATADATE FYEARQ FQTR roaq CFQ
1004 20110831 2011 1 0.00947287 0.016586
1004 20111130 2011 2 0.00965134 0.016939
1004 20120229 2011 3 0.00930643 0.016289
1004 20120531 2011 4 0.00586568 0.023345
1004 20120831 2012 1 0.00838594 0.018386
1004 20121130 2012 2 0.00813045 0.018555
1004 20130228 2012 3 0.0082995 0.013811
1004 20130531 2012 4 0.00028078 0.025198
1004 20130831 2013 1 0.00840691 0.019506
1004 20131130 2013 2 0.00811726 0.019542
1004 20140228 2013 3 0.00763635 0.018812
1004 20140531 2013 4 0.00677427 0.017701
1004 20140831 2014 1 0.0053505 0.012474
1004 20141130 2014 2 0.00623962 0.012711
1004 20150228 2014 3 0.00089043 0.007927
1004 20150531 2014 4 -0.0541914 -0.00776
1004 20150831 2015 1 0.00478283 0.014969
1004 20151130 2015 2 0.00599505 0.015947
1004 20160229 2015 3 0.0065046 0.016882
1004 20160531 2015 4 0.00818251 0.018592

 

When I calculated std based on annual data, I used the following macro, which works well (where firmyear = gvkey+fyear). Is there anyway to modify this macro so that it can use quarterly data rather than yearly data. I prefer this macro as it is efficient. 

                                                                                                                                                                                         

                                                                                                                                                                                        

%macro roastdev(dsin=funda, dsout=funda);                                                                                                                                                     

                                                                                                                                                                                                        

proc sql;                                                                                                                                                                                               

  create table comp_roa1 as                                                                                                                                                                            

  select a.firmyear, b.roa, b.firmyear as prev_firmyear                                                                                                                                                

  from                                                                                                                                                                                                  

    &dsin a, &dsin b                                                                                                                                                                                    

  where                                                                                                                                                                                                

    a.gvkey = b.gvkey                                                                                                                                                                                  

    and a.fyear - 4 <= b.fyear <= a.fyear;                                                                                                                         

quit;                                                                                                                                                                                                   

                                                                                                                                                                                                        

proc sort data = comp_roa1; by firmyear;run;                                                                                                                                                           

                                                                                                                                                                                                        

/* compute stddev */                                                                                                                                                                                    

proc means data=comp_roa1  noprint;                                                                                                                                                                     

output out=comp_roa2 std=/autoname;                                                                                                                                                                     

var roa;                                                                                                                                                                                               

by firmyear ;                                                                                                                                                                                           

run;                                                                                                                                                                                                    

                                                                                                                                                                                                        

/* dataset with standard deviation of roa:  firmyear, roa_stddev require at least 4 obs                                                                                                                

*/                                                                                                                                                                                                     

data comp_roa2;                                                                                                                                                                                         

set comp_roa2;                                                                                                                                                                                          

if _freq_ >= 4;                                                                                                                                                                                         

run;                                                                                                                                                                                                   

                                                                                                                                                                                                        

/* create output dataset */                                                                                                                                                                             

proc sql;                                                                                                                                                                                               

  create table &dsout as                                                                                                                                                                                

  select a.*, b.roa_stddev from &dsin a left join comp_roa2 b on a.firmyear = b.firmyear;                                                                                                              

quit;                                                                                                                                                                                                   

                                                                                                                                                                                                        

%mend;                                                                                                                                                                                                  

                                                                                                                                                                                                       

%roastdev(dsin=work.funda, dsout=funda);                                                                                                                                                     

                                                                                                                                                                                        

proc datasets library=work; delete comp_roa1 comp_roa2; run; 

 

Thank you very much!

 

Best regards, Thierry                                                                                                                                          

                                                                                                                                                                                                        

                                                                                                                                                                                                        

                                                                                                                                                                                                        

 


 

 

Thierrynguyen
Calcite | Level 5

Thank you @Reeza. You are right, the macro code is to calculate moving std from year t-4 to year t. In other words, the calculation of std in year t requires data from year t-4 to t. I checked the and it works well.

 

For my question, I need to calculate std in year t (or I named it yearly std), which is defined as the standard deviation of 20 quarters from Q1 in year t-4 to Q4 in year t. I use this variable as a measure of risk.

 

Best regards,

Thierry

Thierrynguyen
Calcite | Level 5

Also, I do not require 20 quarters to calculate std, say I require 16 quarters data or similar.

novinosrin
Tourmaline | Level 20

I am confused with your further correspondence than your initial question

You mentioned - "I need only yearly std based on 5-year quarterly data. I dont need quarterly std."

 

 

data have;
input GVKEY	DATADATE : yymmdd10.	FYEARQ	FQTR	roaq	CFQ;
format DATADATE yymmdd10.;
cards;
1004	20110831	2011	1	0.00947287	0.016586
1004	20111130	2011	2	0.00965134	0.016939
1004	20120229	2011	3	0.00930643	0.016289
1004	20120531	2011	4	0.00586568	0.023345
1004	20120831	2012	1	0.00838594	0.018386
1004	20121130	2012	2	0.00813045	0.018555
1004	20130228	2012	3	0.0082995	0.013811
1004	20130531	2012	4	0.00028078	0.025198
1004	20130831	2013	1	0.00840691	0.019506
1004	20131130	2013	2	0.00811726	0.019542
1004	20140228	2013	3	0.00763635	0.018812
1004	20140531	2013	4	0.00677427	0.017701
1004	20140831	2014	1	0.0053505	0.012474
1004	20141130	2014	2	0.00623962	0.012711
1004	20150228	2014	3	0.00089043	0.007927
1004	20150531	2014	4	-0.0541914	-0.00776
1004	20150831	2015	1	0.00478283	0.014969
1004	20151130	2015	2	0.00599505	0.015947
1004	20160229	2015	3	0.0065046	0.016882
1004	20160531	2015	4	0.00818251	0.018592
;
/*grouping every 5 years grp 1 aka 2011-2015*/
data temp;
set have;
by GVKEY FYEARQ;
if first.gvkey then do;n=0;grp=1;end;
if FYEARQ ne lag(FYEARQ) then n+1;
if n>5 then do; grp+1;n=1;end;
run;

proc means data =temp nway noprint;
class gvkey grp;
var cfq;
output out=want(drop=_:) std=/ autoname;
run;

 

Reeza
Super User

Ok, so now you just want the moving average within a 5 year period as long as you have more than 4 years (or 16 quarters of data) and don't care about 'yearly' measures? Does it matter where missing data exists?  

 

Do you have a SAS/ETS licensed? If you do, PROC EXPAND is a good option though not sure it can handle the 16 quarters of data requirement. 

 

Otherwise, modifying your macro would be just modifying the where filters. Have you tried that?

 


@Thierrynguyen wrote:

Thank you @Reeza. You are right, the macro code is to calculate moving std from year t-4 to year t. In other words, the calculation of std in year t requires data from year t-4 to t. I checked the and it works well.

 

For my question, I need to calculate std in year t (or I named it yearly std), which is defined as the standard deviation of 20 quarters from Q1 in year t-4 to Q4 in year t. I use this variable as a measure of risk.

 

Best regards,

Thierry


 

 

novinosrin
Tourmaline | Level 20

@Reeza May i request a favor in educating me a tiny bit on the moving average thing pertaining to the dataset plz using like 3-4 records whenever you have time. I am intrigued with zero knowledge at this point although I tend to enroll for a course Time series and forecasting next quarter at my university.

 

Not bothering for expansive details. just a pointer with a dataset and what it means will do. I will dig in later with those inputs. Cheers!

Reeza
Super User

@novinosrin there are a few methods to deal with moving averages. 

It would be in order of preference:

 

1. Proc expand with time series -> most efficient but requires ETS license. Unfortunately also requires a complete data set, so no missing time intervals. 

2. Temporary Arrays -> I like this approach because it's flexible in the size of the lags required and is relatively easy to program. But changing to a 2 or 3 year lag is easy. 

 

I think Dow loops are possible but to be honest, I don't use them very often and have to look up the code every time I need it, which means I don't use it often and I never remember it 🙂

 

This is a decent write up and examples of the various options:

http://sasnrd.com/moving-average/

 

@Thierrynguyen the examples below or in the link above offer a few alternatives that will work for you. 

 

proc sort data=sashelp.stocks out=stocks; 
    by stock date; 
run;

data want;
    array p{0:3} _temporary_;

    set stocks; 

    by stock;

    if first.stock then call missing(of p{*});
    p{mod(_n_,4)} = open;

    *if you want to have a certain number of records before you calculate statistics;
    if n(of p{*})>2 then do;
        lowest = min(of p{*});
        highest = max(of p{*});
    end;


run;
novinosrin
Tourmaline | Level 20

@Reeza Thank you. So in essence, does moving avg mean--> avg(x,lag(x)) ? regardless of technology we use

 

EDIT: Got it. Thank you. I just needed to get my mind to grip with some jargon . 

Reeza
Super User

In general yes, but the definition can vary a bit. For example it could be a leading window or a window statistic or rolling average. The terms are kind of used interchangeably and you need to basically check what a person means when they use it :S. 

I'm finding that more and more as 'data science' brings in computer science terminology rather than statistical terminology.

 


@novinosrin wrote:

@Reeza Thank you. So in essence, does moving avg mean--> avg(x,lag(x)) ? regardless of technology we use


 

mkeintz
PROC Star

You want yearly std of quafterly data for each gvkey/fiscal year.

 

proc summary data=have nway ;
  output out=want (drop=_type_) std(roaq cfq)= /autoname;
  class gvkey fyear;
run;

 

This is probably more efficient than your proc sql approach, although unlikely to be as efficient as a data step - given that your Compustat data are almost certainly sorted by gvkey/fyear/fqtr.

 

The will produce variables _freq_, STD_roaq and STD_cfq.  If you want to eliminate fiscal years with less than four quarters, you could do this:

 


proc summary data=have nway ;
  output out=want (drop=_type_ where=(_freq_=4)) std(roaq cfq)= /autoname;
  class gvkey fyear;
run;

 

The advantage of this approach is that proc summary offers many other statistics in addition to std.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Thierrynguyen
Calcite | Level 5

Thank you very much! Unfortunately, the codes offered by you do not solve my issue. @mkeintz provided a code to calculate yearly std based on 4 quarter data (from here, it is sure that I can calculate the moving average of std for 5 years). However, it is not really what I want. I need to calculate std based on 20 quarter data, or 5 years, and require at least non-missing 16 quarters (randomly is ok). @Reeza and @mkeintz do you have any other ideas?

 

Thank you all very much! Especially, thanks to @Reeza as you have proved that you are an outstanding expert in SAS (you provided several solutions for me so far).

 

Best regards,

Tri

Thierrynguyen
Calcite | Level 5

@Reeza, yes I can use proc expand.

mkeintz
PROC Star

OK, now I understand that you want rolling std's for quarterly data spanning 5 years, with a minimum requirement of 16 quarters of data.

 

Questions:

  1. Do you want rolling quarterly std's, or just the std's for period ending in Q4?
  2. When a quarter is missing, do you mean that the entire record for that GVKEY/FYEARQ/FQTR is missing, or do you mean that the record is present, but the variable of interest has a missing value?
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Thierrynguyen
Calcite | Level 5
Thanks, here is more explanation:
1/ Yes I only want for period ending in Q4, assuming that Q4 is ending of fiscal year.

2/ not clear what is your question. But I mean that if firms released, e.g., only 16 out of 20 quarterly reports in 5 years (from year t-4 to year t), we still have data to calculate standard deviations. Hence, there is non-missing for Q4 of year t. Is that clear?

Thanks,

Best regards,
Thierry

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 2954 views
  • 2 likes
  • 4 in conversation