BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GokulK
Calcite | Level 5

Hello Everyone,

 

I have a dataset (20,000 + observations) that consists of firm identifier, year, and audit committee director names. I want to create a variable (Dir_change) if a new director joins or an existing member leaves the company (it is possible that in some cases the company may not hire a new director to replace the leaving director or hire an additional director to increase the number of members). So to give you an idea, this is how I want my dir_change variable to look.

Firm IdentifierYear DirectorDir_change
A2007Jack0
A2007Tom0
A2007Linda0
A2008Jack1
A2008Tom1
A2008Mary1
    
B2007Jenny0
B2007Jessy0
B2007Joe0
B2008Jenny1
B2008Jessy1
    
C2007Ally0
C2007April0
C2008Ally0
C2008April0
    
D2007Mark0
D2007Frank0
D2008Mark1
D2008Frank1
D2008Mike1

 

In the above example, in company A, Linda is replacing Mary in 2008 and hence I want dir_Change to be 1 for that year. For company B, Joe left the board and hence I want the dir_Change to be 1. There was no change in the directors for company C and hence it is zero. For company D, although no one left the company, Mike joined the company in 2008 and hence I want dir_change to be 1.

 

I am a newbie and know only basic coding so far. I would greatly appreciate it if someone shares the code that will help me get the desired output.

 

Thank you for your time.

Best regards,

GokulK

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Here is a SQL solution involving a full join (always tricky):

 

data have;
input firm $ year name $;
datalines;
A   2007    Jack    0
A   2007    Tom 0
A   2007    Linda   0
A   2008    Jack    1
A   2008    Tom 1
A   2008    Mary    1
B   2007    Jenny   0
B   2007    Jessy   0
B   2007    Joe 0
B   2008    Jenny   1
B   2008    Jessy   1
C   2007    Ally    0
C   2007    April   0
C   2008    Ally    0
C   2008    April   0
D   2007    Mark    0
D   2007    Frank   0
D   2008    Mark    1
D   2008    Frank   1
D   2008    Mike    1
;

proc sql;
create table want as
select 
    h.*,
    coalesce(c.change, 0) as change
from
    have as h left join
    (
    select 
        coalesce(a.firm, b.firm) as firm,
        coalesce(a.year, b.year+1) as year,
        nmiss(a.firm) + nmiss(b.firm) > 0 as change
    from 
        have as a full join
        have as b on 
            a.firm=b.firm and 
            a.year=b.year+1 and 
            a.name=b.name
    group by calculated firm, calculated year
    having count(a.firm) > 0 and count(b.firm) > 0 ) as c
    on h.firm=c.firm and h.year=c.year;
select * from want;
quit;

PGStats_0-1617129706716.png

 

PG

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

Is it always two years for a firm? Or could the data contain more than two years, and then the solution ought to compare each year to the previous year? Or do you compare to the first year?

What about spelling errors and capitalization differences? Are these all cleaned up beforehand?

--
Paige Miller
GokulK
Calcite | Level 5

It is a 10-year data starting 2007. I would like to see if the committee composition changed compared to the previous year. That is, for instance, are the directors in the committee the same in 2019 as they were in 2018. Also, there is a possibility that a company may (1) lose more than one director in a year or (2) lose one and add two directors to their committee. I have capitalized all the names in my actual dataset.

 

Thank you.

Best regards,

GokulK

 

mkeintz
PROC Star

You can read each firmid/year of data twice.  In the first pass, make an array of names, sort the array, then copy the names, separated by commas, to a single character variable (_this_year_names).  Then set the DIR_CHANGE dummy according to whether _this_year_names matches its own lagged value  (except for the first year of the id, when DIR_CHANGE is set to zero):

 

data have;
  input firm_id $ year dirname $;
datalines;
A	2007	Jack	0
A	2007	Tom	0
A	2007	Linda	0
A	2008	Jack	1
A	2008	Tom	1
A	2008	Mary	1
B	2007	Jenny	0
B	2007	Jessy	0
B	2007	Joe	0
B	2008	Jenny	1
B	2008	Jessy	1
C	2007	Ally	0
C	2007	April	0
C	2008	Ally	0
C	2008	April	0
D	2007	Mark	0
D	2007	Frank	0
D	2008	Mark	1
D	2008	Frank	1
D	2008	Mike	1
run;

data want (drop=_:);

  array _dnames {20} $20;      /*One year of names in an array */
  length _this_year_names _last_year_names $400; /*One year as csv long character*/

  /* Read one year of names and populate the array*/
  do _n=1 by 1 until (last.year);
    set have;
    by firm_id year;
    if first.firm_id then dir_change=0; /*First first year only*/
    _dnames{_n}=dirname;
  end;

  /* Sort the array and copy a csv list into a character variable*/
  call sortc(of _dnames{*});
  _this_year_names=catx(',',of _dnames{*});
  _last_year_names=lag(_this_year_names);

  /* Compare to last year value*/
  if dir_change=. then dir_change=(_this_year_names ^= _last_year_names);

  /* Re-read the year and output */
  do until (last.year);
    set have;
    by firm_id year;
    output;
  end;
run;

Just make sure the array size (20 above) accommodates all the names for a single year.  And make sure the length of the _this_year_names & _last_year_names ($400 above) is enough to contain all the names, separated by commas.

 

Note the sorting of the names will put blank names (i.e. if your only have 15 names in a year, youll have 5 blanks) first, followed by the actual names.  This won't matter for the task as you describe it.

 

And of course make sure the each distinct name is consistent in its use of uppercase/lowercase.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

Here is a SQL solution involving a full join (always tricky):

 

data have;
input firm $ year name $;
datalines;
A   2007    Jack    0
A   2007    Tom 0
A   2007    Linda   0
A   2008    Jack    1
A   2008    Tom 1
A   2008    Mary    1
B   2007    Jenny   0
B   2007    Jessy   0
B   2007    Joe 0
B   2008    Jenny   1
B   2008    Jessy   1
C   2007    Ally    0
C   2007    April   0
C   2008    Ally    0
C   2008    April   0
D   2007    Mark    0
D   2007    Frank   0
D   2008    Mark    1
D   2008    Frank   1
D   2008    Mike    1
;

proc sql;
create table want as
select 
    h.*,
    coalesce(c.change, 0) as change
from
    have as h left join
    (
    select 
        coalesce(a.firm, b.firm) as firm,
        coalesce(a.year, b.year+1) as year,
        nmiss(a.firm) + nmiss(b.firm) > 0 as change
    from 
        have as a full join
        have as b on 
            a.firm=b.firm and 
            a.year=b.year+1 and 
            a.name=b.name
    group by calculated firm, calculated year
    having count(a.firm) > 0 and count(b.firm) > 0 ) as c
    on h.firm=c.firm and h.year=c.year;
select * from want;
quit;

PGStats_0-1617129706716.png

 

PG
GokulK
Calcite | Level 5

Thank you everyone for providing me the codes. 

Ksharp
Super User

Was there just two year ?

 

data have;
input firm $ year name $;
datalines;
A   2007    Jack    0
A   2007    Tom 0
A   2007    Linda   0
A   2008    Jack    1
A   2008    Tom 1
A   2008    Mary    1
B   2007    Jenny   0
B   2007    Jessy   0
B   2007    Joe 0
B   2008    Jenny   1
B   2008    Jessy   1
C   2007    Ally    0
C   2007    April   0
C   2008    Ally    0
C   2008    April   0
D   2007    Mark    0
D   2007    Frank   0
D   2008    Mark    1
D   2008    Frank   1
D   2008    Mike    1
;

proc sql;
create table temp as
select distinct firm from
(
select firm,name,count(distinct year) as count
 from have
  group by firm,name
)
group by firm
having sum(count ne 2);

create table want as
select *,case when year=max(year) and firm in (select firm from temp) then 1 else 0 end as flag
 from have as a
  group by firm
   order by 1,2;
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 856 views
  • 0 likes
  • 5 in conversation