Contributor
Posts: 21

# +1 to +3 year average, and -2 to -4 year average

Hi all

If someone could help me calculate these specific averages I would be most grateful.

I have attached my clean data in a csv file to this post, as you will find the number of observations is not too cumbersome.

Basically my objective is for each firm, identified by the permno, I need to calculate the following equation:

(Average CFO in years +1 to +3)  - (Average CFO in years -4 to -2) = Difference

Where difference should be the new column created. The CFO information is provided, I just need to determine how to write code for these averages.

Super User
Posts: 10,761

## Re: +1 to +3 year average, and -2 to -4 year average

If you understand the code I wrote before, it would be easy.

```proc import datafile='c:\step2.csv' out=test_vol dbms=csv replace;getnames=yes;run;

proc sql;
create table temp as
select h.*,
(select avg(cfo) from test_vol where lpermno = h.lpermno and year between h.year+1 and h.year+3 )-
(select avg(cfo) from test_vol where lpermno = h.lpermno and year between h.year-4 and h.year-2) as  difference
from test_vol as h;quit;

```

Ksharp

Contributor
Posts: 21

## Re: +1 to +3 year average, and -2 to -4 year average

Hi Ksharp

Yes I do understand your code however this takes a very long time to execute for me (7+ hours). Although this could be due to a weak processing power on my PC.  May I ask are you able to obtain results using the code above?

If so, could you please post the results in a CSV file and I can have a look?

Many thanks

Super User
Posts: 10,761

## Re: +1 to +3 year average, and -2 to -4 year average

That is because your data is a lttle big (260,000+ ).

if you can pick up some obs from it, you will see the result .

A fast way is spliting it into lots of sub-table. then use the code for each sub-table respectively.

data test_vol;

set test_vol(obs=10000);

run;

Contributor
Posts: 21

## Re: +1 to +3 year average, and -2 to -4 year average

Ok I will try smaller sample periods. You will notice this sample differs to my last post, the dates in the sample above are integers (rather than sas dates). With that in mind, will your code still work?

Posts: 3,852

## Re: +1 to +3 year average, and -2 to -4 year average

It doesn't seem like you have all that much data to me.  I wrote it using LAGS and LEADS computed as LAGS after descending sort on YEAR and it only takes a second or so including all that sorting.

You could try similar using this to get started

http://support.sas.com/kb/24/694.html

Contributor
Posts: 21

## Re: +1 to +3 year average, and -2 to -4 year average

Hi data_null_

I would like to say cool picture haha, I like cats.

Anyway, to business. Really, your code only took few seconds? Sounds brilliant! Could I please see the code for lags and leads? I apologize for being so brash, but I'm on a rather tight deadline with this analysis.

Katy

Posts: 3,852

## Re: +1 to +3 year average, and -2 to -4 year average

I like cats too.

This program uses the nieve method to compute LEADS.  Do you have access to SAS/ETS? If you do I think this can all be done with one call to PROC EXPAND.  Anyway see if you can get this to run on your computer.  The sort at the end may not be necessary depending on what your next step is.

FILENAME FT47F001 'path to csv';

data cfo /view=cfo;
infile FT74F001 firstobs=2 dsd termstr=crlf;
input lpermno \$ year cfo;
run;
data lags;
set cfo;
by lpermno;
array _l

• lag1-lag4;
•    lag1 = lag1(cfo);
lag2 = lag2(cfo);
lag3 = lag3(cfo);
lag4 = lag4(cfo);

if first.lpermno then c=0;
c+
1;

do _n_ = c to dim(_l);
_l[_n_] = .;

end;

drop c lag1;
run;
proc sort data=lags;
by lpermno descending year;
run;
set lags;
by lpermno;
array _l

if first.lpermno then c=0;
c+
1;

do _n_ = c to dim(_l);
_l[_n_] = .;

end;
/*(Average CFO in years +1 to +3)  - (Average CFO in years -4 to -2) = Difference*/
drop c;
run;
by lpermno year;
run;

Contributor
Posts: 21

## Re: +1 to +3 year average, and -2 to -4 year average

Hi data_null

Thank you! The code worked swimmingly! Extremely fast too! The code produced two databases: lags and leads. From my interpretation, the difference column provided in the leads database is what I'm after. Is it safe to delete lags db?

Also, I have provided a link below to a funny blog about a missing cat. I hope you find it as funny as I did!

&quot;yeah thats not what I was looking for at all.&quot;

Posts: 3,852

## Re: +1 to +3 year average, and -2 to -4 year average

Yes the data set LAGS can be discarded.

Super User
Posts: 10,761

## Re: +1 to +3 year average, and -2 to -4 year average

One thing you should consider is when there are some unconsecutive year in your data, NULL's code would not work.

1998

2002 <-missing 1999 2000 2001

2003

2005

2006

Posts: 3,167

## Re: +1 to +3 year average, and -2 to -4 year average

That case, you can use proc timeseries(if you have ETS) to fill the gap first, then use DN's code. If you don't have ETS, you can always fill it up using datastep in needed.

This is where I see not the SQL's strength.

Haikuo

Posts: 3,852

## Re: +1 to +3 year average, and -2 to -4 year average

It is true that if the data should be expanded to work properly then my program will need to be modified to expand the data for each LPERMNO.  There are also duplicate years.  Messy.

data cfoV / view=cfoV;
infile FT74F001 firstobs=2 dsd termstr=crlf;
input lpermno \$ year cfo;
run;
proc sort out=cfo2 nodupkey;

by lpermno year;
run;
proc summary data=cfo2;
by lpermno;
output out=span min(year)=min max(year)=max;

run;
data frameV / view=framev;
set span;
do year = min to max;
output;

end;

run;
data cfo2;
merge framev cfo2;
by lpermno year;
run;
data lags;
set cfo2;
by lpermno;
array _l
• lag1-lag4;
•    lag1 = lag1(cfo);
lag2 = lag2(cfo);
lag3 = lag3(cfo);
lag4 = lag4(cfo);

if first.lpermno then c=0;
c+
1;

do _n_ = c to dim(_l);
_l[_n_] = .;

end;

drop c lag1;
run;
proc sort data=lags;
by lpermno descending year;
run;
set lags;
by lpermno;
array _l

if first.lpermno then c=0;
c+
1;

do _n_ = c to dim(_l);
_l[_n_] = .;

end;
/*(Average CFO in years +1 to +3)  - (Average CFO in years -4 to -2) = Difference*/
drop c;
run;
by lpermno year;
run;
Contributor
Posts: 21

## Re: +1 to +3 year average, and -2 to -4 year average

Thanks everyone.

Just one more step, I have merged the desired data back to my larger database. Mainly I want to compare the differences of individual observations against their industries. Please see attachment.

First, I need to calculate the median 'difference' for each industry. Note the industry classification is represented by a 2-3 digit number in the sicind column.

If the permnos (or individual observation) difference is greater than its industry's median difference (indiff) then a 1 appears in a new column entitled 'Maturity'.

If the permnos (or individual observation) difference is less than its industry's median difference (indfiff) then a 0 appears in a new column entitled 'Maturity'

I was thinking of using something like the code below, what do you guys think?

proc summary data = stage3;

by sicind;

var difference; output out = test median = difference;

run;

Super User
Posts: 10,761