Solved: Re: Identify change in the committee membership

GokulK · Posted 03-30-2021 12:37 PM

Hello Everyone,

I have a dataset (20,000 + observations) that consists of firm identifier, year, and audit committee director names. I want to create a variable (Dir_change) if a new director joins or an existing member leaves the company (it is possible that in some cases the company may not hire a new director to replace the leaving director or hire an additional director to increase the number of members). So to give you an idea, this is how I want my dir_change variable to look.

Firm Identifier	Year	Director	Dir_change
A	2007	Jack	0
A	2007	Tom	0
A	2007	Linda	0
A	2008	Jack	1
A	2008	Tom	1
A	2008	Mary	1

B	2007	Jenny	0
B	2007	Jessy	0
B	2007	Joe	0
B	2008	Jenny	1
B	2008	Jessy	1

C	2007	Ally	0
C	2007	April	0
C	2008	Ally	0
C	2008	April	0

D	2007	Mark	0
D	2007	Frank	0
D	2008	Mark	1
D	2008	Frank	1
D	2008	Mike	1

In the above example, in company A, Linda is replacing Mary in 2008 and hence I want dir_Change to be 1 for that year. For company B, Joe left the board and hence I want the dir_Change to be 1. There was no change in the directors for company C and hence it is zero. For company D, although no one left the company, Mike joined the company in 2008 and hence I want dir_change to be 1.

I am a newbie and know only basic coding so far. I would greatly appreciate it if someone shares the code that will help me get the desired output.

Thank you for your time.

Best regards,

GokulK

PGStats · Posted 03-30-2021 02:41 PM

Here is a SQL solution involving a full join (always tricky):

data have;
input firm $ year name $;
datalines;
A   2007    Jack    0
A   2007    Tom 0
A   2007    Linda   0
A   2008    Jack    1
A   2008    Tom 1
A   2008    Mary    1
B   2007    Jenny   0
B   2007    Jessy   0
B   2007    Joe 0
B   2008    Jenny   1
B   2008    Jessy   1
C   2007    Ally    0
C   2007    April   0
C   2008    Ally    0
C   2008    April   0
D   2007    Mark    0
D   2007    Frank   0
D   2008    Mark    1
D   2008    Frank   1
D   2008    Mike    1
;

proc sql;
create table want as
select 
    h.*,
    coalesce(c.change, 0) as change
from
    have as h left join
    (
    select 
        coalesce(a.firm, b.firm) as firm,
        coalesce(a.year, b.year+1) as year,
        nmiss(a.firm) + nmiss(b.firm) > 0 as change
    from 
        have as a full join
        have as b on 
            a.firm=b.firm and 
            a.year=b.year+1 and 
            a.name=b.name
    group by calculated firm, calculated year
    having count(a.firm) > 0 and count(b.firm) > 0 ) as c
    on h.firm=c.firm and h.year=c.year;
select * from want;
quit;

PG

View solution in original post

PaigeMiller · Posted 03-30-2021 01:19 PM

Is it always two years for a firm? Or could the data contain more than two years, and then the solution ought to compare each year to the previous year? Or do you compare to the first year?

What about spelling errors and capitalization differences? Are these all cleaned up beforehand?

--
Paige Miller

GokulK · Posted 03-30-2021 02:35 PM

It is a 10-year data starting 2007. I would like to see if the committee composition changed compared to the previous year. That is, for instance, are the directors in the committee the same in 2019 as they were in 2018. Also, there is a possibility that a company may (1) lose more than one director in a year or (2) lose one and add two directors to their committee. I have capitalized all the names in my actual dataset.

Thank you.

Best regards,

GokulK

mkeintz · Posted 03-30-2021 02:08 PM

You can read each firmid/year of data twice. In the first pass, make an array of names, sort the array, then copy the names, separated by commas, to a single character variable (_this_year_names). Then set the DIR_CHANGE dummy according to whether _this_year_names matches its own lagged value (except for the first year of the id, when DIR_CHANGE is set to zero):

data have;
  input firm_id $ year dirname $;
datalines;
A	2007	Jack	0
A	2007	Tom	0
A	2007	Linda	0
A	2008	Jack	1
A	2008	Tom	1
A	2008	Mary	1
B	2007	Jenny	0
B	2007	Jessy	0
B	2007	Joe	0
B	2008	Jenny	1
B	2008	Jessy	1
C	2007	Ally	0
C	2007	April	0
C	2008	Ally	0
C	2008	April	0
D	2007	Mark	0
D	2007	Frank	0
D	2008	Mark	1
D	2008	Frank	1
D	2008	Mike	1
run;

data want (drop=_:);

  array _dnames {20} $20;      /*One year of names in an array */
  length _this_year_names _last_year_names $400; /*One year as csv long character*/

  /* Read one year of names and populate the array*/
  do _n=1 by 1 until (last.year);
    set have;
    by firm_id year;
    if first.firm_id then dir_change=0; /*First first year only*/
    _dnames{_n}=dirname;
  end;

  /* Sort the array and copy a csv list into a character variable*/
  call sortc(of _dnames{*});
  _this_year_names=catx(',',of _dnames{*});
  _last_year_names=lag(_this_year_names);

  /* Compare to last year value*/
  if dir_change=. then dir_change=(_this_year_names ^= _last_year_names);

  /* Re-read the year and output */
  do until (last.year);
    set have;
    by firm_id year;
    output;
  end;
run;

Just make sure the array size (20 above) accommodates all the names for a single year. And make sure the length of the _this_year_names & _last_year_names ($400 above) is enough to contain all the names, separated by commas.

Note the sorting of the names will put blank names (i.e. if your only have 15 names in a year, youll have 5 blanks) first, followed by the actual names. This won't matter for the task as you describe it.

And of course make sure the each distinct name is consistent in its use of uppercase/lowercase.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PGStats · Posted 03-30-2021 02:41 PM

Here is a SQL solution involving a full join (always tricky):

data have;
input firm $ year name $;
datalines;
A   2007    Jack    0
A   2007    Tom 0
A   2007    Linda   0
A   2008    Jack    1
A   2008    Tom 1
A   2008    Mary    1
B   2007    Jenny   0
B   2007    Jessy   0
B   2007    Joe 0
B   2008    Jenny   1
B   2008    Jessy   1
C   2007    Ally    0
C   2007    April   0
C   2008    Ally    0
C   2008    April   0
D   2007    Mark    0
D   2007    Frank   0
D   2008    Mark    1
D   2008    Frank   1
D   2008    Mike    1
;

proc sql;
create table want as
select 
    h.*,
    coalesce(c.change, 0) as change
from
    have as h left join
    (
    select 
        coalesce(a.firm, b.firm) as firm,
        coalesce(a.year, b.year+1) as year,
        nmiss(a.firm) + nmiss(b.firm) > 0 as change
    from 
        have as a full join
        have as b on 
            a.firm=b.firm and 
            a.year=b.year+1 and 
            a.name=b.name
    group by calculated firm, calculated year
    having count(a.firm) > 0 and count(b.firm) > 0 ) as c
    on h.firm=c.firm and h.year=c.year;
select * from want;
quit;

PG

GokulK · Posted 03-30-2021 09:11 PM

Thank you everyone for providing me the codes.

Ksharp · Posted 03-31-2021 08:45 AM

Was there just two year ?

data have;
input firm $ year name $;
datalines;
A   2007    Jack    0
A   2007    Tom 0
A   2007    Linda   0
A   2008    Jack    1
A   2008    Tom 1
A   2008    Mary    1
B   2007    Jenny   0
B   2007    Jessy   0
B   2007    Joe 0
B   2008    Jenny   1
B   2008    Jessy   1
C   2007    Ally    0
C   2007    April   0
C   2008    Ally    0
C   2008    April   0
D   2007    Mark    0
D   2007    Frank   0
D   2008    Mark    1
D   2008    Frank   1
D   2008    Mike    1
;

proc sql;
create table temp as
select distinct firm from
(
select firm,name,count(distinct year) as count
 from have
  group by firm,name
)
group by firm
having sum(count ne 2);

create table want as
select *,case when year=max(year) and firm in (select firm from temp) then 1 else 0 end as flag
 from have as a
  group by firm
   order by 1,2;
quit;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away