BookmarkSubscribeRSS Feed
kt_uwa1990
Calcite | Level 5

Hi all

If someone could help me calculate these specific averages I would be most grateful.

I have attached my clean data in a csv file to this post, as you will find the number of observations is not too cumbersome.

Basically my objective is for each firm, identified by the permno, I need to calculate the following equation:

(Average CFO in years +1 to +3)  - (Average CFO in years -4 to -2) = Difference

Where difference should be the new column created. The CFO information is provided, I just need to determine how to write code for these averages.

Anyone with the brain power please help me with this = )

16 REPLIES 16
Ksharp
Super User

If you understand the code I wrote before, it would be easy.

proc import datafile='c:\step2.csv' out=test_vol dbms=csv replace;getnames=yes;run;

proc sql;
create table temp as
 select h.*,
      (select avg(cfo) from test_vol where lpermno = h.lpermno and year between h.year+1 and h.year+3 )-
      (select avg(cfo) from test_vol where lpermno = h.lpermno and year between h.year-4 and h.year-2) as  difference
  from test_vol as h;quit;

Ksharp

kt_uwa1990
Calcite | Level 5

Hi Ksharp

Yes I do understand your code however this takes a very long time to execute for me (7+ hours). Although this could be due to a weak processing power on my PC.  May I ask are you able to obtain results using the code above?

If so, could you please post the results in a CSV file and I can have a look?

Many thanks

Ksharp
Super User

That is because your data is a lttle big (260,000+ ).

if you can pick up some obs from it, you will see the result .

A fast way is spliting it into lots of sub-table. then use the code for each sub-table respectively.

data test_vol;

set test_vol(obs=10000);

run;

消息编辑者为:xia keshan

kt_uwa1990
Calcite | Level 5

Ok I will try smaller sample periods. You will notice this sample differs to my last post, the dates in the sample above are integers (rather than sas dates). With that in mind, will your code still work?

data_null__
Jade | Level 19

It doesn't seem like you have all that much data to me.  I wrote it using LAGS and LEADS computed as LAGS after descending sort on YEAR and it only takes a second or so including all that sorting.

You could try similar using this to get started 

http://support.sas.com/kb/24/694.html

kt_uwa1990
Calcite | Level 5

Hi data_null_

I would like to say cool picture haha, I like cats.

Anyway, to business. Really, your code only took few seconds? Sounds brilliant! Could I please see the code for lags and leads? I apologize for being so brash, but I'm on a rather tight deadline with this analysis.

Katy

data_null__
Jade | Level 19

I like cats too.

This program uses the nieve method to compute LEADS.  Do you have access to SAS/ETS? If you do I think this can all be done with one call to PROC EXPAND.  Anyway see if you can get this to run on your computer.  The sort at the end may not be necessary depending on what your next step is.

FILENAME FT47F001 'path to csv';

data cfo /view=cfo;
   infile FT74F001 firstobs=2 dsd termstr=crlf;
   input lpermno $ year cfo;
   run;
data lags;
   set cfo;
   by lpermno;
   array _l

  • lag1-lag4;
  •    lag1 = lag1(cfo);
       lag2 = lag2(cfo);
       lag3 = lag3(cfo);
       lag4 = lag4(cfo);
      
    if first.lpermno then c=0;
       c+
    1;
      
    do _n_ = c to dim(_l);
          _l[_n_] = .;
         
    end;
      
    drop c lag1;
       run;
    proc sort data=lags;
       by lpermno descending year;
       run;
    data leads;
       set lags;
       by lpermno;
       array _l
  • lead1-lead3;
  •    lead1 = lag1(cfo);
       lead2 = lag2(cfo);
       lead3 = lag3(cfo);
      
    if first.lpermno then c=0;
       c+
    1;
      
    do _n_ = c to dim(_l);
          _l[_n_] = .;
         
    end;
    /*(Average CFO in years +1 to +3)  - (Average CFO in years -4 to -2) = Difference*/
       difference = mean(of lead1-lead3) - mean(of lag4-lag2);
       drop c;
       run;
    proc sort data=leads;
       by lpermno year;
       run;

    kt_uwa1990
    Calcite | Level 5

    Hi data_null

    Thank you! The code worked swimmingly! Extremely fast too! The code produced two databases: lags and leads. From my interpretation, the difference column provided in the leads database is what I'm after. Is it safe to delete lags db?

    Also, I have provided a link below to a funny blog about a missing cat. I hope you find it as funny as I did!

    "yeah thats not what I was looking for at all."

    data_null__
    Jade | Level 19

    Yes the data set LAGS can be discarded.

    Ksharp
    Super User

    One thing you should consider is when there are some unconsecutive year in your data, NULL's code would not work.

    1998

    2002 <-missing 1999 2000 2001

    2003

    2005

    2006

    Haikuo
    Onyx | Level 15

    That case, you can use proc timeseries(if you have ETS) to fill the gap first, then use DN's code. If you don't have ETS, you can always fill it up using datastep in needed.

    This is where I see not the SQL's strength.

    Haikuo

    data_null__
    Jade | Level 19

    It is true that if the data should be expanded to work properly then my program will need to be modified to expand the data for each LPERMNO.  There are also duplicate years.  Messy.

    data cfoV / view=cfoV;
       infile FT74F001 firstobs=2 dsd termstr=crlf;
       input lpermno $ year cfo;
       run;
    proc sort out=cfo2 nodupkey;
      
    by lpermno year;
       run;
    proc summary data=cfo2;
       by lpermno;
       output out=span min(year)=min max(year)=max;
      
    run;
    data frameV / view=framev;
       set span;
       do year = min to max;
          output;
         
    end;
      
    run;
    data cfo2;
       merge framev cfo2;
       by lpermno year;
       run;
    data lags;
       set cfo2;
       by lpermno;
       array _l
  • lag1-lag4;
  •    lag1 = lag1(cfo);
       lag2 = lag2(cfo);
       lag3 = lag3(cfo);
       lag4 = lag4(cfo);
      
    if first.lpermno then c=0;
       c+
    1;
      
    do _n_ = c to dim(_l);
          _l[_n_] = .;
         
    end;
      
    drop c lag1;
       run;
    proc sort data=lags;
       by lpermno descending year;
       run;
    data leads;
       set lags;
       by lpermno;
       array _l
  • lead1-lead3;
  •    lead1 = lag1(cfo);
       lead2 = lag2(cfo);
       lead3 = lag3(cfo);
      
    if first.lpermno then c=0;
       c+
    1;
      
    do _n_ = c to dim(_l);
          _l[_n_] = .;
         
    end;
    /*(Average CFO in years +1 to +3)  - (Average CFO in years -4 to -2) = Difference*/
       difference = mean(of lead1-lead3) - mean(of lag4-lag2);
       drop c;
       run;
    proc sort data=leads;
       by lpermno year;
       run;
    kt_uwa1990
    Calcite | Level 5

    Thanks everyone.

    Just one more step, I have merged the desired data back to my larger database. Mainly I want to compare the differences of individual observations against their industries. Please see attachment.

    First, I need to calculate the median 'difference' for each industry. Note the industry classification is represented by a 2-3 digit number in the sicind column.

    If the permnos (or individual observation) difference is greater than its industry's median difference (indiff) then a 1 appears in a new column entitled 'Maturity'.

    If the permnos (or individual observation) difference is less than its industry's median difference (indfiff) then a 0 appears in a new column entitled 'Maturity'

    I was thinking of using something like the code below, what do you guys think?

    proc summary data = stage3;

    by sicind;

    var difference; output out = test median = difference;

    run;

    Ksharp
    Super User

    If you want fastest speed, I think HashTable can do it. But That will produce horrible long code .I am afraid that I have not so much time to code it.

    If I were you ,I will use HashTable splite it into lots of small tables ,then call execute() to get the result and using proc append to combine them all together.

    Ksharp

    sas-innovate-2024.png

    Don't miss out on SAS Innovate - Register now for the FREE Livestream!

    Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

     

    Register now!

    What is Bayesian Analysis?

    Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 16 replies
    • 1438 views
    • 0 likes
    • 4 in conversation