resolution for missing lead function

JOLSAS · Posted 08-11-2011 05:34 PM

Dear all,

I know there isn't a lead function in SAS and I happen to need this function here in this project I'm doing and this project is due in two weeks. I've googled a few solutions but I still cannot figure out how to do it in my example here. Time is urgent. So please help!

I have a data set looking like this:

gvkey	execid	year	ceo	cfo	other
103	6	1992	0	0	1
103	6	1993	1	0	0
103	6	1994	1	0	0
259	6	1996	0	0	1
259	6	1997	0	0	1
259	6	1998	1	0	0
259	6	1999	1	0	0
259	6	2000	1	0	0
1990	6	2003	0	0	1
1990	6	2004	0	0	1
1990	6	2005	0	1	0
1990	6	2006	0	1	0
34	19	1994	0	0	1
34	19	1995	0	0	1
34	19	1996	0	0	1
34	19	1997	0	0	1
2503	19	1998	0	0	1
2503	19	1999	1	0	0
2503	19	2000	1	0	0
2503	19	2001	1	0	0
2503	19	2002	1	0	0

It's basically saying: the executive with ID (execid) 6 works in the company 103 (gvkey) from 1992 to 1994. S/he is a CEO in 1993 and 1994 (as shown by the "1" in the ceo column) and not a CFO or other executive (as shown by the "0"s in the other columns). S/he is an "other" executive in 1992 though. Then in 1996, this executive goes to the company 259 to work, being an "other" executive in the first two years and then a CEO in the last three years. Then he moved to the company 1990, and is an "other" executive for the first two years and a CFO for the next two years. And then the next section is for the executive #19.

Now I'm interested in how many people are CEOs/CFOs/others in their "old" companies (as indicated by the title the last year he's there) and how many are CEOs/CFOs/others in their new companies (as indicated by the LAST year he's there)? This means that #6 moves from 103 as a CEO to 259 as a CEO but not as an other executive and then moves to 1990 as a CFO.

Further, I need to build a table and fill in all the nine blanks (except the one under "prior title"):

prior title		new title
	CEO	CFO	other
CEO
CFO
other

So, how many people for all the nine possibilities of prior title -> new title?

Using last.gvkey=1 and last.execid=0, I was able to identify those "old"titles. But to identify new titles and match these two, I need the lead function. Can someone help me with this please? Thank you so much!

chang_y_chung_hotmail_com · Posted 08-11-2011 06:35 PM

Here is one way. HTH.

   /* test data */
   data one;
     input gvkey execid year ceo cfo other;
   cards;
   103     6 1992 0 0 1
   103 6 1993 1 0 0
   103   6 1994 1 0 0
   259   6 1996 0 0 1
   259   6 1997 0 0 1
   259   6 1998 1 0 0
   259   6 1999 1 0 0
   259   6 2000 1 0 0
   1990   6 2003 0 0 1
   1990   6 2004 0 0 1
   1990   6 2005 0 1 0
   1990    6 2006 0 1 0
   34     19 1994 0 0 1
   34     19 1995 0 0 1
   34     19 1996 0 0 1
   34     19 1997 0 0 1
   2503 19 1998 0 0 1
   2503 19 1999 1 0 0
   2503 19 2000 1 0 0
   2503 19 2001 1 0 0
   2503 19 2002 1 0 0
   ;
   run;

   /* compress into exec-company */
   %let vars = gvkey execid from to ceo cfo other;
   %let varsc = %sysfunc(translate(&vars,%str(,), %str( )));
   data two;
     call missing(&varsc);
     do until (last.gvkey);
        set one;
        by execid gvkey notsorted;
        if first.gvkey then from = year;
        if last.gvkey then to = year;
        ceos = sum(ceos, ceo);
        cfos = sum(cfos, cfo);
        others = sum(others, other);
     end;
     ceo = ceos > 0;
     cfo = cfos > 0;
     other = others > 0;
     keep &vars;
   run;

   /* pair previous exec-co observation with current */
   data three;
     set two;
     by execid notsorted;
     exceo = ifn(first.execid, ., lag(ceo));
     excfo = ifn(first.execid, ., lag(cfo));
     exother = ifn(first.execid, ., lag(other));
   run;

   /* summarize */
   data _null_;
     set three end=end;
     array count[1:9] _temporary_ (9*0);
     if exceo then do;
       if ceo then count[1] + 1;
       if cfo then count[2] + 1;
       if other then count[3] + 1;
     end;
     if excfo then do;
       if ceo then count[4] + 1;
       if cfo then count[5] + 1;
       if other then count[6] + 1;
     end;
     if exother then do;
       if ceo then count[7] + 1;
       if cfo then count[8] + 1;
       if other then count[9] + 1;
     end;
     if end then do;
       put @1 "ex" @10 "current";
       put @1 " "   @10 "CEO" @20 "CFO" @30 "Other";
       put @1 "CEO" @10 count[1] @20 count[2] @30 count[3];
       put @1 "CFO" @10 count[4] @20 count[5] @30 count[6];
       put @1 "Other" @10 count[7] @20 count[8] @30 count[9];
     end;
   run;
   /* on log
   ex       current
            CEO       CFO       Other
   CEO      1         1         2
   CFO      0         0         0
   Other    2         1         3
   */

JOLSAS · Posted 08-11-2011 06:51 PM

Wow, I definitely appreciate you putting this much effort on this. Thank you very much!

So I've run it, and after the line 6 "set one;", the log file says:

ERROR: Variable ceo has been defined as both character and numeric.

Any idea why is this?

Thank you.

ERROR: Variable cfo has been defined as both character and numeric.

art297 · Posted 08-11-2011 07:06 PM

I ran Chang's code and didn't get the error. In your own dataset is CEO possibly defined as a character (rather than numeric) variable? If it is, rather than change his code, just add a datastep at the beginning. Something like:

data one (drop=a_ceo);

set one (rename=(ceo=a_ceo));

ceo=input(a_ceo,1.);

run;

JOLSAS · Posted 08-11-2011 07:38 PM

I had realized it and I fixed it.

So I was looking at the result. I know I have 1143 different executives in total. so the number of ceo, cfo, and other should add up to 1143, and so should those of exceo,excfo, and exother.

I used:

proc freq data=three;

tables ceo cfo other exceo excfo exother;

run;

but I have ceo, cfo , other adding up to 3114, and 1518 for the sum of the exes.

I'm trying to figure out why...... Maybe I got something wrong.....

art297 · Posted 08-11-2011 07:57 PM

The requirement to have the prior title as part of the table could be problematic. If one comes in and leaves the first company in the same position, you have no way of identifying their prior position, thus would not know which cell to count them in.

JOLSAS · Posted 08-11-2011 08:24 PM

Hello art, I am so sorry and embarrassed.

Remember the selecting data you helped me with yesterday?

While working on this table, I found a sentence in very small font, in a peculiar place, about the selecting data.

What I described yesterday is to pick those years that executives change their firms. If they have multiple changes, then we pick all of them. Well, this new sentence told me that we should only pick up the last time the executive changes firms, which is to say, each executive would have only two different companies, being the last two companies he stayed. If he had previously stayed in other companies, we would delete the older items.

I tried to go back and make revisions to your codes, but it's just too much for me...

Can you think of a way of either revising the codes, or writing something new to delete those older company years? Wichever is convenient.

In fact, with each executive only having two firms, I think I might be able to finish this table myself.

Again, thank you so much!

JOLSAS · Posted 08-11-2011 08:04 PM

I looked at Chang's output

/* on log

ex current

CEO CFO Other

CEO 1 1 2

CFO 0 0 0

Other 2 1 3

*/

I thought this MIGHT be the reason.

Chang's codes seem to count the sum of how many cases of company change for each executive. For example, the data set has three cases of company changes, and one case of ceo -> ceo,one is ceo -> cfo, and the other is other -> ceo. The change of 1990 -> 34 doesn't count because the executive changed. I only care about executive changing their companies. But I don't care if I reach a new executive. I don't know if this makes sense...

I guess what I'd want is a table like this:

/* on log

ex current

CEO CFO Other Total

CEO 1 1 0 2

CFO 0 0 0 0

Other 1 0 0 1

2 1 0 3

*/

So basically I need to count the number of title changes as executives move from company to another.

I hope this makes a little bit more sense.

Anyway, thank you guys so much. I've been able to make some progress with my project, which I couldn't have done without your help.

DBailey · Posted 08-12-2011 09:30 AM

By identifying the most recent position, and the one immediately prior, you can calculate the changes this way as well:

data work.detail_data;

input gvkey execid year ceo cfo other;

cards;

103 6 1992 0 0 1

103 6 1993 1 0 0

103 6 1994 1 0 0

259 6 1996 0 0 1

259 6 1997 0 0 1

259 6 1998 1 0 0

259 6 1999 1 0 0

259 6 2000 1 0 0

1990 6 2003 0 0 1

1990 6 2004 0 0 1

1990 6 2005 0 1 0

1990 6 2006 0 1 0

34 19 1994 0 0 1

34 19 1995 0 0 1

34 19 1996 0 0 1

34 19 1997 0 0 1

2503 19 1998 0 0 1

2503 19 1999 1 0 0

2503 19 2000 1 0 0

2503 19 2001 1 0 0

2503 19 2002 1 0 0

;

run;

proc sql;

create table work.most_recent as select t1.* from work.detail_data t1

where year=(select max(year) from work.detail_data where execid=t1.execid);

create table work.immediately_prior as select t1.* from work.detail_data t1

where t1.year=(select max(year) from work.detail_data where execid=t1.execid

and year < (select year from work.most_recent where execid=t1.execid));

select

case

when t1.ceo=1 then 'CEO'

when t1.cfo=1 then 'CFO'

else 'Other'

end as From_Title,

sum(case when t2.ceo=1 then 1 else 0 end) as To_CEO,

sum(case when t2.cfo=1 then 1 else 0 end) as To_CFO,

sum(case when t2.other=1 then 1 else 0 end) as To_Other

from

work.immediately_prior t1

inner join work.most_recent t2

on t1.execid=t2.execid

group by calculated from_title;

quit;

which produces this output:

From_Title	To_CEO	To_CFO	To_Other
CEO	1	0	0
CFO	0	1	0

Registration is open