Hi all
If someone could help me calculate these specific averages I would be most grateful.
I have attached my clean data in a csv file to this post, as you will find the number of observations is not too cumbersome.
Basically my objective is for each firm, identified by the permno, I need to calculate the following equation:
(Average CFO in years +1 to +3) - (Average CFO in years -4 to -2) = Difference
Where difference should be the new column created. The CFO information is provided, I just need to determine how to write code for these averages.
Anyone with the brain power please help me with this = )
If you understand the code I wrote before, it would be easy.
proc import datafile='c:\step2.csv' out=test_vol dbms=csv replace;getnames=yes;run; proc sql; create table temp as select h.*, (select avg(cfo) from test_vol where lpermno = h.lpermno and year between h.year+1 and h.year+3 )- (select avg(cfo) from test_vol where lpermno = h.lpermno and year between h.year-4 and h.year-2) as difference from test_vol as h;quit;
Ksharp
Hi Ksharp
Yes I do understand your code however this takes a very long time to execute for me (7+ hours). Although this could be due to a weak processing power on my PC. May I ask are you able to obtain results using the code above?
If so, could you please post the results in a CSV file and I can have a look?
Many thanks
That is because your data is a lttle big (260,000+ ).
if you can pick up some obs from it, you will see the result .
A fast way is spliting it into lots of sub-table. then use the code for each sub-table respectively.
data test_vol;
set test_vol(obs=10000);
run;
消息编辑者为:xia keshan
Ok I will try smaller sample periods. You will notice this sample differs to my last post, the dates in the sample above are integers (rather than sas dates). With that in mind, will your code still work?
It doesn't seem like you have all that much data to me. I wrote it using LAGS and LEADS computed as LAGS after descending sort on YEAR and it only takes a second or so including all that sorting.
You could try similar using this to get started
Hi data_null_
I would like to say cool picture haha, I like cats.
Anyway, to business. Really, your code only took few seconds? Sounds brilliant! Could I please see the code for lags and leads? I apologize for being so brash, but I'm on a rather tight deadline with this analysis.
Katy
I like cats too.
This program uses the nieve method to compute LEADS. Do you have access to SAS/ETS? If you do I think this can all be done with one call to PROC EXPAND. Anyway see if you can get this to run on your computer. The sort at the end may not be necessary depending on what your next step is.
FILENAME FT47F001 'path to csv';
data cfo /view=cfo;
infile FT74F001 firstobs=2 dsd termstr=crlf;
input lpermno $ year cfo;
run;
data lags;
set cfo;
by lpermno;
array _l
lag2 = lag2(cfo);
lag3 = lag3(cfo);
lag4 = lag4(cfo);
if first.lpermno then c=0;
c+1;
do _n_ = c to dim(_l);
_l[_n_] = .;
end;
drop c lag1;
run;
proc sort data=lags;
by lpermno descending year;
run;
data leads;
set lags;
by lpermno;
array _l
lead2 = lag2(cfo);
lead3 = lag3(cfo);
if first.lpermno then c=0;
c+1;
do _n_ = c to dim(_l);
_l[_n_] = .;
end;
/*(Average CFO in years +1 to +3) - (Average CFO in years -4 to -2) = Difference*/
difference = mean(of lead1-lead3) - mean(of lag4-lag2);
drop c;
run;
proc sort data=leads;
by lpermno year;
run;
Hi data_null
Thank you! The code worked swimmingly! Extremely fast too! The code produced two databases: lags and leads. From my interpretation, the difference column provided in the leads database is what I'm after. Is it safe to delete lags db?
Also, I have provided a link below to a funny blog about a missing cat. I hope you find it as funny as I did!
Yes the data set LAGS can be discarded.
One thing you should consider is when there are some unconsecutive year in your data, NULL's code would not work.
1998
2002 <-missing 1999 2000 2001
2003
2005
2006
It is true that if the data should be expanded to work properly then my program will need to be modified to expand the data for each LPERMNO. There are also duplicate years. Messy.
Thanks everyone.
Just one more step, I have merged the desired data back to my larger database. Mainly I want to compare the differences of individual observations against their industries. Please see attachment.
First, I need to calculate the median 'difference' for each industry. Note the industry classification is represented by a 2-3 digit number in the sicind column.
If the permnos (or individual observation) difference is greater than its industry's median difference (indiff) then a 1 appears in a new column entitled 'Maturity'.
If the permnos (or individual observation) difference is less than its industry's median difference (indfiff) then a 0 appears in a new column entitled 'Maturity'
I was thinking of using something like the code below, what do you guys think?
proc summary data = stage3;
by sicind;
var difference; output out = test median = difference;
run;
If you want fastest speed, I think HashTable can do it. But That will produce horrible long code .I am afraid that I have not so much time to code it.
If I were you ,I will use HashTable splite it into lots of small tables ,then call execute() to get the result and using proc append to combine them all together.
Ksharp
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.