BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GokulK
Calcite | Level 5

Hello All,

I need your help. I have a pooled dataset with more than 5000 firm observations with the following information: year (yr) firm identifier (id), and Expert. EXPERT consists of three values (1, 2, 3). EXPERT takes value 1 if the CEO has a business degree, 2 if the CEO has an engineering degree, and 3 if the CEO has a medical degree.

 

I want to create a new variable Status that takes the following values:

(1) The first year of the firm and until the change in CEO, Status = “bb” if the EXPERT =1, “ee” if the EXPERT =2, and “mm” if the EXPERT =3.

 

(2) In the year of CEO change and the following years until any new change in CEO, Status takes the following values:

Status is “be” if the current year EXPERT = 2 and previous year EXPERT = 1.

Status is “bm” if the current year EXPERT = 3 and previous year EXPERT = 1.

Status is “eb” if the current year EXPERT = 1 and previous year EXPERT =2.

Status is “em” if the current year EXPERT = 3 and previous year EXPERT =2.

Status is “mb” if the current year EXPERT = 1 and previous year EXPERT =3.

Status is “me” if the current year EXPERT = 2 and previous year EXPERT =3.

 

 

data have;

input yr id $ Expert;

cards;

1980 a 1

1981 a 1

1982 a 2

1983 a 2

1984 a 2

1985 a 2

1986 a 2

1987 a 3

1988 a 3

1980 b 2

1981 b 2

1982 b 2

1984 b 3

1985 b 3

1986 b 3

1987 b 3

1981 c 3

1982 c 3

1983 c 3

1984 c 3

1985 c 1

1986 c 1

1987 c 1

1988 c 1

1989 c 3

1990 c 3

1992 c 3

1993 c 3

1994 c 2

1995 c 2

1996 c 2

 

The final sample I want is the following:

yr    id Expert status

1980 a     1     bb

1981 a     1     bb

1982 a     2     be

1983 a     2     be

1984 a     2     be

1985 a     2     be

1986 a     2     be

1987 a     3     em

1988 a     3     em

 

1980 b     2     ee

1981 b     2     ee

1982 b     2     ee

1984 b     3     em

1985 b     3     em

1986 b     3     em

1987 b     3     em

 

1981 c     3     mm

1982 c     3     mm

1983 c     3     mm

1984 c     3     mm

1985 c     1     mb

1986 c     1     mb

1987 c     1     mb

1988 c     1     mb

1989 c     3     bm

1990 c     3     bm

1992 c     3     bm

1993 c     3     bm

1994 c     2     me

1995 c     2     me

1996 c     2     me

 

I would appreciate it if someone provides me with the codes.

Thank you for your time.

Best regards,

Gokul

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Here you go.

data have;
  input yr id $ Expert;
  cards;
1980 a 1
1981 a 1
1982 a 2
1983 a 2
1984 a 2
1985 a 2
1986 a 2
1987 a 3
1988 a 3
1980 b 2
1981 b 2
1982 b 2
1984 b 3
1985 b 3
1986 b 3
1987 b 3
1981 c 3
1982 c 3
1983 c 3
1984 c 3
1985 c 1
1986 c 1
1987 c 1
1988 c 1
1989 c 3
1990 c 3
1992 c 3
1993 c 3
1994 c 2
1995 c 2
1996 c 2
;

data want(drop=_:);
  set have;
  by id yr;
  length status $2;
  retain status;
  _lag_expert=lag(expert);
  if first.id then 
    do;
           if expert=1 then status='bb';
      else if expert=2 then status='ee';
      else if expert=3 then status='mm';
    end;
  else 
    do;
           if EXPERT = 2 and _lag_expert = 1 then status='be';
      else if EXPERT = 3 and _lag_expert = 1 then status='bm';
      else if EXPERT = 1 and _lag_expert = 2 then status='eb'; 
      else if EXPERT = 3 and _lag_expert = 2 then status='em';
      else if EXPERT = 1 and _lag_expert = 3 then status='mb';
      else if EXPERT = 2 and _lag_expert = 3 then status='me';
    end;
run;

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

Here you go.

data have;
  input yr id $ Expert;
  cards;
1980 a 1
1981 a 1
1982 a 2
1983 a 2
1984 a 2
1985 a 2
1986 a 2
1987 a 3
1988 a 3
1980 b 2
1981 b 2
1982 b 2
1984 b 3
1985 b 3
1986 b 3
1987 b 3
1981 c 3
1982 c 3
1983 c 3
1984 c 3
1985 c 1
1986 c 1
1987 c 1
1988 c 1
1989 c 3
1990 c 3
1992 c 3
1993 c 3
1994 c 2
1995 c 2
1996 c 2
;

data want(drop=_:);
  set have;
  by id yr;
  length status $2;
  retain status;
  _lag_expert=lag(expert);
  if first.id then 
    do;
           if expert=1 then status='bb';
      else if expert=2 then status='ee';
      else if expert=3 then status='mm';
    end;
  else 
    do;
           if EXPERT = 2 and _lag_expert = 1 then status='be';
      else if EXPERT = 3 and _lag_expert = 1 then status='bm';
      else if EXPERT = 1 and _lag_expert = 2 then status='eb'; 
      else if EXPERT = 3 and _lag_expert = 2 then status='em';
      else if EXPERT = 1 and _lag_expert = 3 then status='mb';
      else if EXPERT = 2 and _lag_expert = 3 then status='me';
    end;
run;
GokulK
Calcite | Level 5

Thank you to both of you for providing me with the codes. 

mkeintz
PROC Star

Let's reframe your task description.   You want a two-letter STATUS code, in which the first letter comes from the first obs for each ID, and the second letter comes from the current obs.  The letter is just a simple translation of the EXPERT code  (1--'b', 2-->'e', 3-->'m'):

 

data want (drop=_:);
  set have;
  by id expert notsorted;

  if expert=1 then _single_char='b'; else
  if expert=2 then _single_char='e'; else
  if expert=3 then _single_char='m'; 

  retain _first_single_char ' ';
  if first.id then _first_single_char=_single_char;

  length status $2;
  status=cats(_first_single_char,_single_char);
run;

But why not

  1. Make a variable (EXPERT1) representing EXPERT for the first obs of an ID.
  2. Make a second variable (EXPERT_PRIOR) represent the EXPERT value of the prior obs

Then by comparing EXPERT to EXPERT_PRIOR and/or EXPERT1 you have a precise indicator of type of transition of CEO expertise.

 

data want2 ;
  set have;
  by id expert notsorted;

  if first.id then expert1=expert;
  retain expert1;
  expert_prior=lag(expert);
  if first.id then expert_prior=.;
run;

 

BTW, what if a given CEO has more than one type of expertise?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
GokulK
Calcite | Level 5

The data is arranged in such a way that the CEO can have only one expertise. Thank you for the codes.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 798 views
  • 0 likes
  • 3 in conversation