Hello All,
I need your help. I have a pooled dataset with more than 5000 firm observations with the following information: year (yr) firm identifier (id), and Expert. EXPERT consists of three values (1, 2, 3). EXPERT takes value 1 if the CEO has a business degree, 2 if the CEO has an engineering degree, and 3 if the CEO has a medical degree.
I want to create a new variable Status that takes the following values:
(1) The first year of the firm and until the change in CEO, Status = “bb” if the EXPERT =1, “ee” if the EXPERT =2, and “mm” if the EXPERT =3.
(2) In the year of CEO change and the following years until any new change in CEO, Status takes the following values:
Status is “be” if the current year EXPERT = 2 and previous year EXPERT = 1.
Status is “bm” if the current year EXPERT = 3 and previous year EXPERT = 1.
Status is “eb” if the current year EXPERT = 1 and previous year EXPERT =2.
Status is “em” if the current year EXPERT = 3 and previous year EXPERT =2.
Status is “mb” if the current year EXPERT = 1 and previous year EXPERT =3.
Status is “me” if the current year EXPERT = 2 and previous year EXPERT =3.
data have;
input yr id $ Expert;
cards;
1980 a 1
1981 a 1
1982 a 2
1983 a 2
1984 a 2
1985 a 2
1986 a 2
1987 a 3
1988 a 3
1980 b 2
1981 b 2
1982 b 2
1984 b 3
1985 b 3
1986 b 3
1987 b 3
1981 c 3
1982 c 3
1983 c 3
1984 c 3
1985 c 1
1986 c 1
1987 c 1
1988 c 1
1989 c 3
1990 c 3
1992 c 3
1993 c 3
1994 c 2
1995 c 2
1996 c 2
The final sample I want is the following:
yr id Expert status
1980 a 1 bb
1981 a 1 bb
1982 a 2 be
1983 a 2 be
1984 a 2 be
1985 a 2 be
1986 a 2 be
1987 a 3 em
1988 a 3 em
1980 b 2 ee
1981 b 2 ee
1982 b 2 ee
1984 b 3 em
1985 b 3 em
1986 b 3 em
1987 b 3 em
1981 c 3 mm
1982 c 3 mm
1983 c 3 mm
1984 c 3 mm
1985 c 1 mb
1986 c 1 mb
1987 c 1 mb
1988 c 1 mb
1989 c 3 bm
1990 c 3 bm
1992 c 3 bm
1993 c 3 bm
1994 c 2 me
1995 c 2 me
1996 c 2 me
I would appreciate it if someone provides me with the codes.
Thank you for your time.
Best regards,
Gokul
Here you go.
data have;
input yr id $ Expert;
cards;
1980 a 1
1981 a 1
1982 a 2
1983 a 2
1984 a 2
1985 a 2
1986 a 2
1987 a 3
1988 a 3
1980 b 2
1981 b 2
1982 b 2
1984 b 3
1985 b 3
1986 b 3
1987 b 3
1981 c 3
1982 c 3
1983 c 3
1984 c 3
1985 c 1
1986 c 1
1987 c 1
1988 c 1
1989 c 3
1990 c 3
1992 c 3
1993 c 3
1994 c 2
1995 c 2
1996 c 2
;
data want(drop=_:);
set have;
by id yr;
length status $2;
retain status;
_lag_expert=lag(expert);
if first.id then
do;
if expert=1 then status='bb';
else if expert=2 then status='ee';
else if expert=3 then status='mm';
end;
else
do;
if EXPERT = 2 and _lag_expert = 1 then status='be';
else if EXPERT = 3 and _lag_expert = 1 then status='bm';
else if EXPERT = 1 and _lag_expert = 2 then status='eb';
else if EXPERT = 3 and _lag_expert = 2 then status='em';
else if EXPERT = 1 and _lag_expert = 3 then status='mb';
else if EXPERT = 2 and _lag_expert = 3 then status='me';
end;
run;
Here you go.
data have;
input yr id $ Expert;
cards;
1980 a 1
1981 a 1
1982 a 2
1983 a 2
1984 a 2
1985 a 2
1986 a 2
1987 a 3
1988 a 3
1980 b 2
1981 b 2
1982 b 2
1984 b 3
1985 b 3
1986 b 3
1987 b 3
1981 c 3
1982 c 3
1983 c 3
1984 c 3
1985 c 1
1986 c 1
1987 c 1
1988 c 1
1989 c 3
1990 c 3
1992 c 3
1993 c 3
1994 c 2
1995 c 2
1996 c 2
;
data want(drop=_:);
set have;
by id yr;
length status $2;
retain status;
_lag_expert=lag(expert);
if first.id then
do;
if expert=1 then status='bb';
else if expert=2 then status='ee';
else if expert=3 then status='mm';
end;
else
do;
if EXPERT = 2 and _lag_expert = 1 then status='be';
else if EXPERT = 3 and _lag_expert = 1 then status='bm';
else if EXPERT = 1 and _lag_expert = 2 then status='eb';
else if EXPERT = 3 and _lag_expert = 2 then status='em';
else if EXPERT = 1 and _lag_expert = 3 then status='mb';
else if EXPERT = 2 and _lag_expert = 3 then status='me';
end;
run;
Thank you to both of you for providing me with the codes.
Let's reframe your task description. You want a two-letter STATUS code, in which the first letter comes from the first obs for each ID, and the second letter comes from the current obs. The letter is just a simple translation of the EXPERT code (1--'b', 2-->'e', 3-->'m'):
data want (drop=_:);
set have;
by id expert notsorted;
if expert=1 then _single_char='b'; else
if expert=2 then _single_char='e'; else
if expert=3 then _single_char='m';
retain _first_single_char ' ';
if first.id then _first_single_char=_single_char;
length status $2;
status=cats(_first_single_char,_single_char);
run;
But why not
Then by comparing EXPERT to EXPERT_PRIOR and/or EXPERT1 you have a precise indicator of type of transition of CEO expertise.
data want2 ;
set have;
by id expert notsorted;
if first.id then expert1=expert;
retain expert1;
expert_prior=lag(expert);
if first.id then expert_prior=.;
run;
BTW, what if a given CEO has more than one type of expertise?
The data is arranged in such a way that the CEO can have only one expertise. Thank you for the codes.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.