Hello All,
I need your help. I have a pooled dataset with more than 5000 firm observations with the following information: year (yr) firm identifier (id), and Expert. EXPERT consists of three values (1, 2, 3). EXPERT takes value 1 if the CEO has a business degree, 2 if the CEO has an engineering degree, and 3 if the CEO has a medical degree.
I want to create a new variable Status that takes the following values:
(1) The first year of the firm and until the change in CEO, Status = “bb” if the EXPERT =1, “ee” if the EXPERT =2, and “mm” if the EXPERT =3.
(2) In the year of CEO change and the following years until any new change in CEO, Status takes the following values:
Status is “be” if the current year EXPERT = 2 and previous year EXPERT = 1.
Status is “bm” if the current year EXPERT = 3 and previous year EXPERT = 1.
Status is “eb” if the current year EXPERT = 1 and previous year EXPERT =2.
Status is “em” if the current year EXPERT = 3 and previous year EXPERT =2.
Status is “mb” if the current year EXPERT = 1 and previous year EXPERT =3.
Status is “me” if the current year EXPERT = 2 and previous year EXPERT =3.
data have;
input yr id $ Expert;
cards;
1980 a 1
1981 a 1
1982 a 2
1983 a 2
1984 a 2
1985 a 2
1986 a 2
1987 a 3
1988 a 3
1980 b 2
1981 b 2
1982 b 2
1984 b 3
1985 b 3
1986 b 3
1987 b 3
1981 c 3
1982 c 3
1983 c 3
1984 c 3
1985 c 1
1986 c 1
1987 c 1
1988 c 1
1989 c 3
1990 c 3
1992 c 3
1993 c 3
1994 c 2
1995 c 2
1996 c 2
The final sample I want is the following:
yr id Expert status
1980 a 1 bb
1981 a 1 bb
1982 a 2 be
1983 a 2 be
1984 a 2 be
1985 a 2 be
1986 a 2 be
1987 a 3 em
1988 a 3 em
1980 b 2 ee
1981 b 2 ee
1982 b 2 ee
1984 b 3 em
1985 b 3 em
1986 b 3 em
1987 b 3 em
1981 c 3 mm
1982 c 3 mm
1983 c 3 mm
1984 c 3 mm
1985 c 1 mb
1986 c 1 mb
1987 c 1 mb
1988 c 1 mb
1989 c 3 bm
1990 c 3 bm
1992 c 3 bm
1993 c 3 bm
1994 c 2 me
1995 c 2 me
1996 c 2 me
I would appreciate it if someone provides me with the codes.
Thank you for your time.
Best regards,
Gokul
Here you go.
data have;
input yr id $ Expert;
cards;
1980 a 1
1981 a 1
1982 a 2
1983 a 2
1984 a 2
1985 a 2
1986 a 2
1987 a 3
1988 a 3
1980 b 2
1981 b 2
1982 b 2
1984 b 3
1985 b 3
1986 b 3
1987 b 3
1981 c 3
1982 c 3
1983 c 3
1984 c 3
1985 c 1
1986 c 1
1987 c 1
1988 c 1
1989 c 3
1990 c 3
1992 c 3
1993 c 3
1994 c 2
1995 c 2
1996 c 2
;
data want(drop=_:);
set have;
by id yr;
length status $2;
retain status;
_lag_expert=lag(expert);
if first.id then
do;
if expert=1 then status='bb';
else if expert=2 then status='ee';
else if expert=3 then status='mm';
end;
else
do;
if EXPERT = 2 and _lag_expert = 1 then status='be';
else if EXPERT = 3 and _lag_expert = 1 then status='bm';
else if EXPERT = 1 and _lag_expert = 2 then status='eb';
else if EXPERT = 3 and _lag_expert = 2 then status='em';
else if EXPERT = 1 and _lag_expert = 3 then status='mb';
else if EXPERT = 2 and _lag_expert = 3 then status='me';
end;
run;
Here you go.
data have;
input yr id $ Expert;
cards;
1980 a 1
1981 a 1
1982 a 2
1983 a 2
1984 a 2
1985 a 2
1986 a 2
1987 a 3
1988 a 3
1980 b 2
1981 b 2
1982 b 2
1984 b 3
1985 b 3
1986 b 3
1987 b 3
1981 c 3
1982 c 3
1983 c 3
1984 c 3
1985 c 1
1986 c 1
1987 c 1
1988 c 1
1989 c 3
1990 c 3
1992 c 3
1993 c 3
1994 c 2
1995 c 2
1996 c 2
;
data want(drop=_:);
set have;
by id yr;
length status $2;
retain status;
_lag_expert=lag(expert);
if first.id then
do;
if expert=1 then status='bb';
else if expert=2 then status='ee';
else if expert=3 then status='mm';
end;
else
do;
if EXPERT = 2 and _lag_expert = 1 then status='be';
else if EXPERT = 3 and _lag_expert = 1 then status='bm';
else if EXPERT = 1 and _lag_expert = 2 then status='eb';
else if EXPERT = 3 and _lag_expert = 2 then status='em';
else if EXPERT = 1 and _lag_expert = 3 then status='mb';
else if EXPERT = 2 and _lag_expert = 3 then status='me';
end;
run;
Thank you to both of you for providing me with the codes.
Let's reframe your task description. You want a two-letter STATUS code, in which the first letter comes from the first obs for each ID, and the second letter comes from the current obs. The letter is just a simple translation of the EXPERT code (1--'b', 2-->'e', 3-->'m'):
data want (drop=_:);
set have;
by id expert notsorted;
if expert=1 then _single_char='b'; else
if expert=2 then _single_char='e'; else
if expert=3 then _single_char='m';
retain _first_single_char ' ';
if first.id then _first_single_char=_single_char;
length status $2;
status=cats(_first_single_char,_single_char);
run;
But why not
Then by comparing EXPERT to EXPERT_PRIOR and/or EXPERT1 you have a precise indicator of type of transition of CEO expertise.
data want2 ;
set have;
by id expert notsorted;
if first.id then expert1=expert;
retain expert1;
expert_prior=lag(expert);
if first.id then expert_prior=.;
run;
BTW, what if a given CEO has more than one type of expertise?
The data is arranged in such a way that the CEO can have only one expertise. Thank you for the codes.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.