02-26-2012 08:09 PM
Good days to all,
I have a variable, return on assets (ROA) financial years 2003 to 2010 for US firms. I would like to calculate performance volatility (=standard deviation of ROA for 3 years) using quarterly data. It does mean that there will be (4 qtrs x 3 years =) maximum ROA of 12 quarters for the calculation of standard deviation.
fyearq = financial year
fqtr = quarter of the financial year (there are 4 quarters per annum)
tic = ticker number (company identification codes)
datafqtr = financial year + quarter
ROA = return on asset
I would like to calculate performance volatility for the following years:
year 2003 = standard deviation of ROA in 3 years (overlapping period 2003 - 2005)
year 2004 = standard deviation of ROA in 3 years (overlapping period 2004 - 2006)
year 2005 = standard deviation of ROA in 3 years (overlapping period 2005 - 2007)
year 2006 = standard deviation of ROA in 3 years (overlapping period 2006 - 2008)
year 2007 = standard deviation of ROA in 3 years (overlapping period 2007 - 2009)
year 2008 = standard deviation of ROA in 3 years (overlapping period 2008 - 2010)
1. if there is missing value is found in between/at the middle of the overlapping period, the standard deviation calculation shouldn't be affected (ie. the standard deviation value should be derived).
2. If missing value is found at the begining/ending of the year, the standard deviation calculation should be ignored (ie. should not be derived).
Thank you and hope to get some valuable response.
02-26-2012 08:21 PM
You can always use the intnx function to get the starting and ending dates (see: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212700.htm )
and set the value to missing if you don't have a non-missing value for those dates.
proc summary might be the easiest way to do the calculations and will ignore missing values.
02-26-2012 08:27 PM
Can proc expand helpful? Can SAS read the id as YYYYQQ?
I used proc expand for calculation of lag values. I know this function can calculate the lead values too.
I also think about another way - using FIRST. and LAST. function. But I do not know how this function apply to both tic and quarter?
02-26-2012 08:37 PM
I wish to calculate the standard deviation of ROA for a given firm for 3 years period. As all the ROA are arranged in a column, I must identify the first and last ROA used to calculate the standard deviation. Therefore, I feel that FIRST. and LAST. might be useful for data cleaning before the SD calculation.
Or I also consider to arrange the ROA in (8 years x 4 qtrs) 32 columns using proc expand for easy calculation of SD.
02-26-2012 09:01 PM
or simply select records between x and y. According to your specs, if you don't have a record for both x and y, then you are not supposed to calculate the SD.
02-26-2012 09:08 PM
This query does the trick: it excludes all cases where at least one of the missing ROA values occurs on a first quarter (fqtr=1) . Otherwise, it calculates the standard deviations from all non-missing ROA values. There are 33661 std values returned.
input begYear @@;
2003 2004 2005 2006 2007 2008
create table Roa2Std as
select tic, begYear, std(ROA) as RoaStd
from fPath.Roa2, beginnings
where fyearq between begYear and begYear+2
group by tic, begYear
having sum(ROA is not missing or fqtr ne 1) = 12;