SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

resolution for missing lead function

Reply
Contributor
Posts: 45

resolution for missing lead function

Dear all,

I know there isn't a lead function in SAS and I happen to need this function here in this project I'm doing and this project is due in two weeks. I've googled a few solutions but I still cannot figure out how to do it in my example here. Time is urgent. So please help!

I have a data set looking like this:

gvkeyexecidyearceocfoother
10361992001
10361993100
10361994100
25961996001
25961997001
25961998100
25961999100
25962000100
199062003001
199062004001
199062005010
199062006010
34191994001
34191995001
34191996001
34191997001
2503191998001
2503191999100
2503192000100
2503192001100
2503192002100

It's basically saying: the executive with ID (execid) 6 works in the company 103 (gvkey) from 1992 to 1994. S/he is a CEO in 1993 and 1994 (as shown by the "1" in the ceo column) and not a CFO or other executive (as shown by the "0"s in the other columns). S/he is an "other" executive in 1992 though. Then in 1996, this executive goes to the company 259 to work, being an "other" executive in the first two years and then a CEO in the last three years. Then he moved to the company 1990, and is an "other" executive for the first two years and a CFO for the next two years. And then the next section is for the executive #19.

Now I'm interested in how many people are CEOs/CFOs/others in their "old" companies (as indicated by the title the last year he's there) and how many are  CEOs/CFOs/others in their new companies (as indicated by the LAST year he's there)? This means that #6 moves from 103 as a CEO to 259 as a CEO but not as an other executive and then moves to 1990 as a CFO.

Further, I need to build a table and fill in all the nine blanks (except the one under "prior title"):

prior titlenew title
CEOCFOother
CEO
CFO
other

So, how many people for all the nine possibilities of prior title -> new title?

Using last.gvkey=1 and last.execid=0, I was able to identify those "old"titles. But to identify new titles and match these two, I need the lead function. Can someone help me with this please? Thank you so much!

Regular Contributor
Posts: 241

resolution for missing lead function

Here is one way. HTH.

   /* test data */
   data one;
     input gvkey execid year ceo cfo other;
   cards;
   103     6  1992  0  0  1
   103    6  1993  1  0  0
   103     6  1994  1  0  0
   259     6  1996  0  0  1
   259     6  1997  0  0  1
   259     6  1998  1  0  0
   259     6  1999  1  0  0
   259     6  2000  1  0  0
   1990   6  2003  0  0  1
   1990   6  2004  0  0  1
   1990   6  2005  0  1  0
   1990    6  2006  0  1  0
   34     19  1994  0  0  1
   34     19  1995  0  0  1
   34     19  1996  0  0  1
   34     19  1997  0  0  1
   2503  19  1998  0  0  1
   2503  19  1999  1  0  0
   2503  19  2000  1  0  0
   2503  19  2001  1  0  0
   2503  19  2002  1  0  0
   ;
   run;

   /* compress into exec-company */
   %let vars = gvkey execid from to ceo cfo other;
   %let varsc = %sysfunc(translate(&vars,%str(,), %str( )));
   data two; 
     call missing(&varsc);
     do until (last.gvkey);
        set one;
        by execid gvkey notsorted;
        if first.gvkey then from = year;
        if last.gvkey then to = year;
        ceos = sum(ceos, ceo);
        cfos = sum(cfos, cfo);
        others = sum(others, other);
     end;
     ceo = ceos > 0;
     cfo = cfos > 0;
     other = others > 0;
     keep &vars;
   run;

   /* pair previous exec-co observation with current */
   data three;
     set two;
     by execid notsorted;
     exceo = ifn(first.execid, ., lag(ceo));
     excfo = ifn(first.execid, ., lag(cfo));
     exother = ifn(first.execid, ., lag(other));
   run;

   /* summarize */
   data _null_;
     set three end=end;
     array count[1:9] _temporary_ (9*0);
     if exceo then do;
       if ceo then count[1] + 1;
       if cfo then count[2] + 1;
       if other then count[3] + 1;
     end;
     if excfo then do;
       if ceo then count[4] + 1;
       if cfo then count[5] + 1;
       if other then count[6] + 1;
     end;
     if exother then do;
       if ceo then count[7] + 1;
       if cfo then count[8] + 1;
       if other then count[9] + 1;
     end;
     if end then do;
       put @1 "ex" @10 "current";
       put @1 " "   @10 "CEO" @20 "CFO" @30 "Other";
       put @1 "CEO" @10 count[1] @20 count[2] @30 count[3];
       put @1 "CFO" @10 count[4] @20 count[5] @30 count[6];
       put @1 "Other" @10 count[7] @20 count[8] @30 count[9];
     end;
   run;
   /* on log
   ex       current
            CEO       CFO       Other
   CEO      1         1         2
   CFO      0         0         0
   Other    2         1         3
   */

Contributor
Posts: 45

resolution for missing lead function

Posted in reply to chang_y_chung_hotmail_com

Wow, I definitely appreciate you putting this much effort on this. Thank you very much!

So I've run it, and after the line 6 "set one;", the log file says:

ERROR: Variable ceo has been defined as both character and numeric.

Any idea why is this?

Thank you.

ERROR: Variable cfo has been defined as both character and numeric.

PROC Star
Posts: 7,468

Re: resolution for missing lead function

I ran Chang's code and didn't get the error.  In your own dataset is CEO possibly defined as a character (rather than numeric) variable?  If it is, rather than change his code, just add a datastep at the beginning.  Something like:

data one (drop=a_ceo);

  set one (rename=(ceo=a_ceo));

  ceo=input(a_ceo,1.);

run;

Contributor
Posts: 45

resolution for missing lead function

I had realized it and I fixed it.

So I was looking at the result. I know I have 1143 different executives in total. so the number of ceo, cfo, and other should add up to 1143, and so should those of exceo,excfo, and exother.

I used:

proc freq data=three;

tables ceo cfo other exceo excfo exother;

run;

but I have ceo, cfo , other adding up to 3114, and 1518 for the sum of the exes.

I'm trying to figure out why...... Maybe I got something wrong.....

PROC Star
Posts: 7,468

resolution for missing lead function

The requirement to have the prior title as part of the table could be problematic.  If one comes in and leaves the first company in the same position, you have no way of identifying their prior position, thus would not know which cell to count them in.

Contributor
Posts: 45

resolution for missing lead function

Hello art, I am so sorry and embarrassed.

Remember the selecting data you helped me with yesterday?

While working on this table, I found a sentence in very small font, in a peculiar place, about the selecting data.

What I described yesterday is to pick those years that executives change their firms. If they have multiple changes, then we pick all of them. Well, this new sentence told me that we should only pick up the last time the executive changes firms, which is to say, each executive would have only two different companies, being the last two companies he stayed. If he had previously stayed in other companies, we would delete the older items.

I tried to go back and make revisions to your codes, but it's just too much for me...Smiley Sad

Can you think of a way of either revising the codes, or writing something new to delete those older company years? Wichever is convenient.

In fact, with each executive only having two firms, I think I might be able to finish this table myself.

Again, thank you so much!

Contributor
Posts: 45

resolution for missing lead function

Posted in reply to chang_y_chung_hotmail_com


I looked at Chang's output

/* on log

   ex       current

            CEO       CFO       Other

   CEO      1         1         2

   CFO      0         0         0

   Other    2         1         3

   */

I thought this MIGHT be the reason.

Chang's codes seem to count the sum of how many cases of company change for each executive. For example, the data set has three cases of company changes, and one case of ceo -> ceo,one is ceo -> cfo, and the other is other -> ceo. The change of 1990 -> 34 doesn't count because the executive changed. I only care about executive changing their companies. But I don't care if I reach a new executive.  I don't know if this makes sense...

I guess what I'd want is a table like this:

/* on log

   ex       current

            CEO       CFO       Other   Total

   CEO      1         1         0        2

   CFO      0         0         0        0

   Other   1         0         0       1

            2         1         0        3

  */

So basically I need to count the number of title changes as executives move from company to another.

I hope this makes a little bit more sense.

Anyway, thank you guys so much. I've been able to make some progress with my project, which I couldn't have done without your help.

Super Contributor
Posts: 578

resolution for missing lead function


By identifying the most recent position, and the one immediately prior, you can calculate the changes this way as well:

data work.detail_data;

input gvkey execid year ceo cfo other;

cards;

103 6 1992 0 0 1

103 6 1993 1 0 0

103 6 1994 1 0 0

259 6 1996 0 0 1

259 6 1997 0 0 1

259 6 1998 1 0 0

259 6 1999 1 0 0

259 6 2000 1 0 0

1990 6 2003 0 0 1

1990 6 2004 0 0 1

1990 6 2005 0 1 0

1990 6 2006 0 1 0

34 19 1994 0 0 1

34 19 1995 0 0 1

34 19 1996 0 0 1

34 19 1997 0 0 1

2503 19 1998 0 0 1

2503 19 1999 1 0 0

2503 19 2000 1 0 0

2503 19 2001 1 0 0

2503 19 2002 1 0 0

;

run;

proc sql;

create table work.most_recent as select t1.* from work.detail_data t1

where year=(select max(year) from work.detail_data where execid=t1.execid);

create table work.immediately_prior as select t1.* from work.detail_data t1

where t1.year=(select max(year) from work.detail_data where execid=t1.execid

and year < (select year from work.most_recent where execid=t1.execid));

select

      case

            when t1.ceo=1 then 'CEO'

            when t1.cfo=1 then 'CFO'

            else 'Other'

      end as From_Title,

      sum(case when t2.ceo=1 then 1 else 0 end) as To_CEO,

      sum(case when t2.cfo=1 then 1 else 0 end) as To_CFO,

      sum(case when t2.other=1 then 1 else 0 end) as To_Other

from

      work.immediately_prior  t1

      inner join work.most_recent t2

            on t1.execid=t2.execid

group by calculated from_title;    

quit;

which produces this output:

From_TitleTo_CEOTo_CFOTo_Other
CEO100
CFO010
Ask a Question
Discussion stats
  • 8 replies
  • 333 views
  • 4 likes
  • 4 in conversation