Help using Base SAS procedures

+1 to +3 year average, and -2 to -4 year average

Reply
Contributor
Posts: 21

+1 to +3 year average, and -2 to -4 year average

Hi all

If someone could help me calculate these specific averages I would be most grateful.

I have attached my clean data in a csv file to this post, as you will find the number of observations is not too cumbersome.

Basically my objective is for each firm, identified by the permno, I need to calculate the following equation:

(Average CFO in years +1 to +3)  - (Average CFO in years -4 to -2) = Difference

Where difference should be the new column created. The CFO information is provided, I just need to determine how to write code for these averages.

Anyone with the brain power please help me with this = )

Attachment
Super User
Posts: 10,035

Re: +1 to +3 year average, and -2 to -4 year average

Posted in reply to kt_uwa1990

If you understand the code I wrote before, it would be easy.

proc import datafile='c:\step2.csv' out=test_vol dbms=csv replace;getnames=yes;run;

proc sql;
create table temp as
 select h.*,
      (select avg(cfo) from test_vol where lpermno = h.lpermno and year between h.year+1 and h.year+3 )-
      (select avg(cfo) from test_vol where lpermno = h.lpermno and year between h.year-4 and h.year-2) as  difference
  from test_vol as h;quit;

Ksharp

Contributor
Posts: 21

Re: +1 to +3 year average, and -2 to -4 year average

Hi Ksharp

Yes I do understand your code however this takes a very long time to execute for me (7+ hours). Although this could be due to a weak processing power on my PC.  May I ask are you able to obtain results using the code above?

If so, could you please post the results in a CSV file and I can have a look?

Many thanks

Super User
Posts: 10,035

Re: +1 to +3 year average, and -2 to -4 year average

Posted in reply to kt_uwa1990

That is because your data is a lttle big (260,000+ ).

if you can pick up some obs from it, you will see the result .

A fast way is spliting it into lots of sub-table. then use the code for each sub-table respectively.

data test_vol;

set test_vol(obs=10000);

run;

消息编辑者为:xia keshan

Contributor
Posts: 21

Re: +1 to +3 year average, and -2 to -4 year average

Ok I will try smaller sample periods. You will notice this sample differs to my last post, the dates in the sample above are integers (rather than sas dates). With that in mind, will your code still work?

Respected Advisor
Posts: 3,799

Re: +1 to +3 year average, and -2 to -4 year average

Posted in reply to kt_uwa1990

It doesn't seem like you have all that much data to me.  I wrote it using LAGS and LEADS computed as LAGS after descending sort on YEAR and it only takes a second or so including all that sorting.

You could try similar using this to get started 

http://support.sas.com/kb/24/694.html

Contributor
Posts: 21

Re: +1 to +3 year average, and -2 to -4 year average

Posted in reply to data_null__

Hi data_null_

I would like to say cool picture haha, I like cats.

Anyway, to business. Really, your code only took few seconds? Sounds brilliant! Could I please see the code for lags and leads? I apologize for being so brash, but I'm on a rather tight deadline with this analysis.

Katy

Respected Advisor
Posts: 3,799

Re: +1 to +3 year average, and -2 to -4 year average

Posted in reply to kt_uwa1990

I like cats too.

This program uses the nieve method to compute LEADS.  Do you have access to SAS/ETS? If you do I think this can all be done with one call to PROC EXPAND.  Anyway see if you can get this to run on your computer.  The sort at the end may not be necessary depending on what your next step is.

FILENAME FT47F001 'path to csv';

data cfo /view=cfo;
   infile FT74F001 firstobs=2 dsd termstr=crlf;
   input lpermno $ year cfo;
   run;
data lags;
   set cfo;
   by lpermno;
   array _l

  • lag1-lag4;
  •    lag1 = lag1(cfo);
       lag2 = lag2(cfo);
       lag3 = lag3(cfo);
       lag4 = lag4(cfo);
      
    if first.lpermno then c=0;
       c+
    1;
      
    do _n_ = c to dim(_l);
          _l[_n_] = .;
         
    end;
      
    drop c lag1;
       run;
    proc sort data=lags;
       by lpermno descending year;
       run;
    data leads;
       set lags;
       by lpermno;
       array _l
  • lead1-lead3;
  •    lead1 = lag1(cfo);
       lead2 = lag2(cfo);
       lead3 = lag3(cfo);
      
    if first.lpermno then c=0;
       c+
    1;
      
    do _n_ = c to dim(_l);
          _l[_n_] = .;
         
    end;
    /*(Average CFO in years +1 to +3)  - (Average CFO in years -4 to -2) = Difference*/
       difference = mean(of lead1-lead3) - mean(of lag4-lag2);
       drop c;
       run;
    proc sort data=leads;
       by lpermno year;
       run;

    Contributor
    Posts: 21

    Re: +1 to +3 year average, and -2 to -4 year average

    Posted in reply to data_null__

    Hi data_null

    Thank you! The code worked swimmingly! Extremely fast too! The code produced two databases: lags and leads. From my interpretation, the difference column provided in the leads database is what I'm after. Is it safe to delete lags db?

    Also, I have provided a link below to a funny blog about a missing cat. I hope you find it as funny as I did!

    "yeah thats not what I was looking for at all."

    Respected Advisor
    Posts: 3,799

    Re: +1 to +3 year average, and -2 to -4 year average

    Posted in reply to kt_uwa1990

    Yes the data set LAGS can be discarded.

    Super User
    Posts: 10,035

    Re: +1 to +3 year average, and -2 to -4 year average

    Posted in reply to kt_uwa1990

    One thing you should consider is when there are some unconsecutive year in your data, NULL's code would not work.

    1998

    2002 <-missing 1999 2000 2001

    2003

    2005

    2006

    Respected Advisor
    Posts: 3,156

    Re: +1 to +3 year average, and -2 to -4 year average

    That case, you can use proc timeseries(if you have ETS) to fill the gap first, then use DN's code. If you don't have ETS, you can always fill it up using datastep in needed.

    This is where I see not the SQL's strength.

    Haikuo

    Respected Advisor
    Posts: 3,799

    Re: +1 to +3 year average, and -2 to -4 year average

    It is true that if the data should be expanded to work properly then my program will need to be modified to expand the data for each LPERMNO.  There are also duplicate years.  Messy.

    data cfoV / view=cfoV;
       infile FT74F001 firstobs=2 dsd termstr=crlf;
       input lpermno $ year cfo;
       run;
    proc sort out=cfo2 nodupkey;
      
    by lpermno year;
       run;
    proc summary data=cfo2;
       by lpermno;
       output out=span min(year)=min max(year)=max;
      
    run;
    data frameV / view=framev;
       set span;
       do year = min to max;
          output;
         
    end;
      
    run;
    data cfo2;
       merge framev cfo2;
       by lpermno year;
       run;
    data lags;
       set cfo2;
       by lpermno;
       array _l
  • lag1-lag4;
  •    lag1 = lag1(cfo);
       lag2 = lag2(cfo);
       lag3 = lag3(cfo);
       lag4 = lag4(cfo);
      
    if first.lpermno then c=0;
       c+
    1;
      
    do _n_ = c to dim(_l);
          _l[_n_] = .;
         
    end;
      
    drop c lag1;
       run;
    proc sort data=lags;
       by lpermno descending year;
       run;
    data leads;
       set lags;
       by lpermno;
       array _l
  • lead1-lead3;
  •    lead1 = lag1(cfo);
       lead2 = lag2(cfo);
       lead3 = lag3(cfo);
      
    if first.lpermno then c=0;
       c+
    1;
      
    do _n_ = c to dim(_l);
          _l[_n_] = .;
         
    end;
    /*(Average CFO in years +1 to +3)  - (Average CFO in years -4 to -2) = Difference*/
       difference = mean(of lead1-lead3) - mean(of lag4-lag2);
       drop c;
       run;
    proc sort data=leads;
       by lpermno year;
       run;
    Contributor
    Posts: 21

    Re: +1 to +3 year average, and -2 to -4 year average

    Posted in reply to data_null__

    Thanks everyone.

    Just one more step, I have merged the desired data back to my larger database. Mainly I want to compare the differences of individual observations against their industries. Please see attachment.

    First, I need to calculate the median 'difference' for each industry. Note the industry classification is represented by a 2-3 digit number in the sicind column.

    If the permnos (or individual observation) difference is greater than its industry's median difference (indiff) then a 1 appears in a new column entitled 'Maturity'.

    If the permnos (or individual observation) difference is less than its industry's median difference (indfiff) then a 0 appears in a new column entitled 'Maturity'

    I was thinking of using something like the code below, what do you guys think?

    proc summary data = stage3;

    by sicind;

    var difference; output out = test median = difference;

    run;

    Super User
    Posts: 10,035

    Re: +1 to +3 year average, and -2 to -4 year average

    Posted in reply to kt_uwa1990

    If you want fastest speed, I think HashTable can do it. But That will produce horrible long code .I am afraid that I have not so much time to code it.

    If I were you ,I will use HashTable splite it into lots of small tables ,then call execute() to get the result and using proc append to combine them all together.

    Ksharp

    Ask a Question
    Discussion stats
    • 16 replies
    • 284 views
    • 0 likes
    • 4 in conversation