Dear all,
I know there isn't a lead function in SAS and I happen to need this function here in this project I'm doing and this project is due in two weeks. I've googled a few solutions but I still cannot figure out how to do it in my example here. Time is urgent. So please help!
I have a data set looking like this:
gvkey | execid | year | ceo | cfo | other |
103 | 6 | 1992 | 0 | 0 | 1 |
103 | 6 | 1993 | 1 | 0 | 0 |
103 | 6 | 1994 | 1 | 0 | 0 |
259 | 6 | 1996 | 0 | 0 | 1 |
259 | 6 | 1997 | 0 | 0 | 1 |
259 | 6 | 1998 | 1 | 0 | 0 |
259 | 6 | 1999 | 1 | 0 | 0 |
259 | 6 | 2000 | 1 | 0 | 0 |
1990 | 6 | 2003 | 0 | 0 | 1 |
1990 | 6 | 2004 | 0 | 0 | 1 |
1990 | 6 | 2005 | 0 | 1 | 0 |
1990 | 6 | 2006 | 0 | 1 | 0 |
34 | 19 | 1994 | 0 | 0 | 1 |
34 | 19 | 1995 | 0 | 0 | 1 |
34 | 19 | 1996 | 0 | 0 | 1 |
34 | 19 | 1997 | 0 | 0 | 1 |
2503 | 19 | 1998 | 0 | 0 | 1 |
2503 | 19 | 1999 | 1 | 0 | 0 |
2503 | 19 | 2000 | 1 | 0 | 0 |
2503 | 19 | 2001 | 1 | 0 | 0 |
2503 | 19 | 2002 | 1 | 0 | 0 |
It's basically saying: the executive with ID (execid) 6 works in the company 103 (gvkey) from 1992 to 1994. S/he is a CEO in 1993 and 1994 (as shown by the "1" in the ceo column) and not a CFO or other executive (as shown by the "0"s in the other columns). S/he is an "other" executive in 1992 though. Then in 1996, this executive goes to the company 259 to work, being an "other" executive in the first two years and then a CEO in the last three years. Then he moved to the company 1990, and is an "other" executive for the first two years and a CFO for the next two years. And then the next section is for the executive #19.
Now I'm interested in how many people are CEOs/CFOs/others in their "old" companies (as indicated by the title the last year he's there) and how many are CEOs/CFOs/others in their new companies (as indicated by the LAST year he's there)? This means that #6 moves from 103 as a CEO to 259 as a CEO but not as an other executive and then moves to 1990 as a CFO.
Further, I need to build a table and fill in all the nine blanks (except the one under "prior title"):
prior title | new title | ||
CEO | CFO | other | |
CEO | |||
CFO | |||
other |
So, how many people for all the nine possibilities of prior title -> new title?
Using last.gvkey=1 and last.execid=0, I was able to identify those "old"titles. But to identify new titles and match these two, I need the lead function. Can someone help me with this please? Thank you so much!
Here is one way. HTH.
/* test data */
data one;
input gvkey execid year ceo cfo other;
cards;
103 6 1992 0 0 1
103 6 1993 1 0 0
103 6 1994 1 0 0
259 6 1996 0 0 1
259 6 1997 0 0 1
259 6 1998 1 0 0
259 6 1999 1 0 0
259 6 2000 1 0 0
1990 6 2003 0 0 1
1990 6 2004 0 0 1
1990 6 2005 0 1 0
1990 6 2006 0 1 0
34 19 1994 0 0 1
34 19 1995 0 0 1
34 19 1996 0 0 1
34 19 1997 0 0 1
2503 19 1998 0 0 1
2503 19 1999 1 0 0
2503 19 2000 1 0 0
2503 19 2001 1 0 0
2503 19 2002 1 0 0
;
run;
/* compress into exec-company */
%let vars = gvkey execid from to ceo cfo other;
%let varsc = %sysfunc(translate(&vars,%str(,), %str( )));
data two;
call missing(&varsc);
do until (last.gvkey);
set one;
by execid gvkey notsorted;
if first.gvkey then from = year;
if last.gvkey then to = year;
ceos = sum(ceos, ceo);
cfos = sum(cfos, cfo);
others = sum(others, other);
end;
ceo = ceos > 0;
cfo = cfos > 0;
other = others > 0;
keep &vars;
run;
/* pair previous exec-co observation with current */
data three;
set two;
by execid notsorted;
exceo = ifn(first.execid, ., lag(ceo));
excfo = ifn(first.execid, ., lag(cfo));
exother = ifn(first.execid, ., lag(other));
run;
/* summarize */
data _null_;
set three end=end;
array count[1:9] _temporary_ (9*0);
if exceo then do;
if ceo then count[1] + 1;
if cfo then count[2] + 1;
if other then count[3] + 1;
end;
if excfo then do;
if ceo then count[4] + 1;
if cfo then count[5] + 1;
if other then count[6] + 1;
end;
if exother then do;
if ceo then count[7] + 1;
if cfo then count[8] + 1;
if other then count[9] + 1;
end;
if end then do;
put @1 "ex" @10 "current";
put @1 " " @10 "CEO" @20 "CFO" @30 "Other";
put @1 "CEO" @10 count[1] @20 count[2] @30 count[3];
put @1 "CFO" @10 count[4] @20 count[5] @30 count[6];
put @1 "Other" @10 count[7] @20 count[8] @30 count[9];
end;
run;
/* on log
ex current
CEO CFO Other
CEO 1 1 2
CFO 0 0 0
Other 2 1 3
*/
Wow, I definitely appreciate you putting this much effort on this. Thank you very much!
So I've run it, and after the line 6 "set one;", the log file says:
ERROR: Variable ceo has been defined as both character and numeric.
Any idea why is this?
Thank you.
ERROR: Variable cfo has been defined as both character and numeric.
I ran Chang's code and didn't get the error. In your own dataset is CEO possibly defined as a character (rather than numeric) variable? If it is, rather than change his code, just add a datastep at the beginning. Something like:
data one (drop=a_ceo);
set one (rename=(ceo=a_ceo));
ceo=input(a_ceo,1.);
run;
I had realized it and I fixed it.
So I was looking at the result. I know I have 1143 different executives in total. so the number of ceo, cfo, and other should add up to 1143, and so should those of exceo,excfo, and exother.
I used:
proc freq data=three;
tables ceo cfo other exceo excfo exother;
run;
but I have ceo, cfo , other adding up to 3114, and 1518 for the sum of the exes.
I'm trying to figure out why...... Maybe I got something wrong.....
The requirement to have the prior title as part of the table could be problematic. If one comes in and leaves the first company in the same position, you have no way of identifying their prior position, thus would not know which cell to count them in.
Hello art, I am so sorry and embarrassed.
Remember the selecting data you helped me with yesterday?
While working on this table, I found a sentence in very small font, in a peculiar place, about the selecting data.
What I described yesterday is to pick those years that executives change their firms. If they have multiple changes, then we pick all of them. Well, this new sentence told me that we should only pick up the last time the executive changes firms, which is to say, each executive would have only two different companies, being the last two companies he stayed. If he had previously stayed in other companies, we would delete the older items.
I tried to go back and make revisions to your codes, but it's just too much for me...
Can you think of a way of either revising the codes, or writing something new to delete those older company years? Wichever is convenient.
In fact, with each executive only having two firms, I think I might be able to finish this table myself.
Again, thank you so much!
I looked at Chang's output
/* on log
ex current
CEO CFO Other
CEO 1 1 2
CFO 0 0 0
Other 2 1 3
*/
I thought this MIGHT be the reason.
Chang's codes seem to count the sum of how many cases of company change for each executive. For example, the data set has three cases of company changes, and one case of ceo -> ceo,one is ceo -> cfo, and the other is other -> ceo. The change of 1990 -> 34 doesn't count because the executive changed. I only care about executive changing their companies. But I don't care if I reach a new executive. I don't know if this makes sense...
I guess what I'd want is a table like this:
/* on log
ex current
CEO CFO Other Total
CEO 1 1 0 2
CFO 0 0 0 0
Other 1 0 0 1
2 1 0 3
*/
So basically I need to count the number of title changes as executives move from company to another.
I hope this makes a little bit more sense.
Anyway, thank you guys so much. I've been able to make some progress with my project, which I couldn't have done without your help.
By identifying the most recent position, and the one immediately prior, you can calculate the changes this way as well:
data work.detail_data;
input gvkey execid year ceo cfo other;
cards;
103 6 1992 0 0 1
103 6 1993 1 0 0
103 6 1994 1 0 0
259 6 1996 0 0 1
259 6 1997 0 0 1
259 6 1998 1 0 0
259 6 1999 1 0 0
259 6 2000 1 0 0
1990 6 2003 0 0 1
1990 6 2004 0 0 1
1990 6 2005 0 1 0
1990 6 2006 0 1 0
34 19 1994 0 0 1
34 19 1995 0 0 1
34 19 1996 0 0 1
34 19 1997 0 0 1
2503 19 1998 0 0 1
2503 19 1999 1 0 0
2503 19 2000 1 0 0
2503 19 2001 1 0 0
2503 19 2002 1 0 0
;
run;
proc sql;
create table work.most_recent as select t1.* from work.detail_data t1
where year=(select max(year) from work.detail_data where execid=t1.execid);
create table work.immediately_prior as select t1.* from work.detail_data t1
where t1.year=(select max(year) from work.detail_data where execid=t1.execid
and year < (select year from work.most_recent where execid=t1.execid));
select
case
when t1.ceo=1 then 'CEO'
when t1.cfo=1 then 'CFO'
else 'Other'
end as From_Title,
sum(case when t2.ceo=1 then 1 else 0 end) as To_CEO,
sum(case when t2.cfo=1 then 1 else 0 end) as To_CFO,
sum(case when t2.other=1 then 1 else 0 end) as To_Other
from
work.immediately_prior t1
inner join work.most_recent t2
on t1.execid=t2.execid
group by calculated from_title;
quit;
which produces this output:
From_Title | To_CEO | To_CFO | To_Other |
---|---|---|---|
CEO | 1 | 0 | 0 |
CFO | 0 | 1 | 0 |
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.