Dear SAS Community. I am trying to identify if CEO is also a chair on the board. For this purpose I need to create a dummy equal to 1 if the CEO is also a chair. The text is contained in the TITLEANN.
Here is the sample code I use. First I clean the TITLEANN of all the unnecessary characters. Then I would like to use findw with multiple conditions but I am able to use only one. How can I identify the word chairman regardless of whether it is small or large caps and all of its variations like "chmn" and so on. Is it with prxmatch or findw function?
Can I have a list of all the variations of chairman and use them with this function?
data ceo_anncomp1;
set ceo_anncomp;
TITLEANN1 = compbl(translate(TITLEANN, " ", ";.,:()-/&"));
if findw(TITLEANN1,"chmn CEO")>0 then ceo_chair=1;
else ceo_chair=0;
run;
data test1.TEST;
infile datalines dsd truncover;
input id:$6. YEAR:32. TITLEANN:$250.;
label id="Company ID Number" YEAR="Fiscal Year" TITLEANN="Annual Title";
datalines4;
008596,1992,chmn. & pres.
010884,1993,chmn. & CEO
014881,1993,chmn. & CEO
002812,1993,chmn. & CEO
008596,1993,chmn. & pres.
012349,1994,chmn. & CEO
010884,1994,chmn. & CEO
014881,1994,chmn. & CEO
002812,1994,chmn. & CEO
013468,1994,CEO
008596,1994,chmn. & pres.
012349,1995,chmn. & CEO
010884,1995,chmn. & CEO
014881,1995,chmn. & CEO
002812,1995,chmn. & CEO
013468,1995,chmn. & CEO
008596,1995,chmn. & CEO
012349,1996,v-p & acting CEO
010884,1996,chmn. & CEO
014881,1996,chmn. & CEO
028524,1996,chmn. & CEO
002812,1996,chmn. & CEO
013468,1996,chmn. & CEO
008596,1996,pres. & CEO
010884,1997,chmn. & CEO
014881,1997,chmn. & CEO
028524,1997,CEO
002812,1997,chmn. & CEO
013468,1997,CEO
008596,1997,"chmn, pres. & CEO"
010884,1998,chmn. & CEO
028524,1998,chmn. & CEO
002812,1998,chmn. & CEO
008596,1998,"chmn, pres. & CEO"
010884,1999,chmn. & CEO
028524,1999,chmn. & CEO
002812,1999,chmn. & CEO
010884,2000,chairman & CEO
028524,2000,CEO
002812,2000,chairman & CEO
140760,2000,president & CEO
010884,2001,chairman & CEO
028524,2001,vice chairman & CEO
002812,2001,chairman & CEO
140760,2001,president & CEO
010884,2002,"chairman, president & CEO"
028524,2002,CEO
002812,2002,chairman & CEO
140760,2002,president & CEO
010884,2003,"chairman, president & CEO"
028524,2003,president & CEO
140760,2003,president & CEO
010884,2004,"chairman, president & CEO"
028524,2004,president & CEO
002812,2004,chairman & CEO
140760,2004,"chairman, president & CEO"
010884,2005,"chairman, president & CEO"
028524,2005,president & CEO
002812,2005,chairman & CEO
140760,2005,"chairman, president & CEO"
010884,2006,"chairman, president & CEO"
028524,2006,president & CEO
002812,2006,chairman & CEO
140760,2006,"chairman, president & CEO"
010884,2007,"chairman, president & CEO"
002812,2007,president & CEO
140760,2007,"chairman, president & CEO"
010884,2008,"chairman, president & CEO"
002812,2008,president & CEO
030443,2008,
140760,2008,"chairman, president & CEO"
010884,2009,"Chairman, Chief Executive Officer and President"
002812,2009,"Chief Executive Officer, President, Director and Member of Corporate Compensation & Benefits Committee"
030443,2009,Chairman and Chief Executive Officer
140760,2009,"chairman, president & CEO"
010884,2010,"Chairman, Chief Executive Officer and President"
002812,2010,"Chief Executive Officer, President, Director and Member of Executive Committee"
030443,2010,Chairman and Chief Executive Officer
140760,2010,"Chairman, Chief Executive Officer, President and Chairman of Executive Committee"
010884,2011,"Chairman, Chief Executive Officer and President"
002812,2011,"Chief Executive Officer, President, Director and Member of Executive Committee"
030443,2011,Chairman and Chief Executive Officer
140760,2011,"Chairman, Chief Executive Officer, President and Chairman of Executive Committee"
010884,2012,Consultant
002812,2012,Director and Member of Executive Committee
030443,2012,Chairman and Chief Executive Officer
140760,2012,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
010884,2013,"Chief Executive Officer, President and Director"
002812,2013,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
030443,2013,Chairman of the Board and Chief Executive Officer
140760,2013,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
010884,2014,"Chief Executive Officer, President and Director"
002812,2014,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
030443,2014,Chairman of the Board and Chief Executive Officer
140760,2014,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
030443,2015,Chairman and Chief Executive Officer
140760,2015,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
002812,2016,Chairman and Chief Executive Officer
030443,2016,Chairman and Chief Executive Officer
140760,2016,"Executive Chairman, Chief Executive Officer and Chairman of Executive Committee"
;;;;
data TEST;
infile datalines dsd truncover;
input id:$6. YEAR TITLEANN:$250.;
label id="Company ID Number" YEAR="Fiscal Year" TITLEANN="Annual Title";
if prxmatch('/\b(chmn|chairman)\b/i',TITLEANN) then is_chair=1;
if prxmatch('/\b(CEO|Chief Executive Officer)\b/i',TITLEANN) then is_ceo=1;
ceo_chair=ifn(is_chair and is_ceo ,1,0);
datalines4;
008596,1992,chmn. & pres.
010884,1993,chmn. & CEO
014881,1993,chmn. & CEO
002812,1993,chmn. & CEO
008596,1993,chmn. & pres.
012349,1994,chmn. & CEO
010884,1994,chmn. & CEO
014881,1994,chmn. & CEO
002812,1994,chmn. & CEO
013468,1994,CEO
008596,1994,chmn. & pres.
012349,1995,chmn. & CEO
010884,1995,chmn. & CEO
014881,1995,chmn. & CEO
002812,1995,chmn. & CEO
013468,1995,chmn. & CEO
008596,1995,chmn. & CEO
012349,1996,v-p & acting CEO
010884,1996,chmn. & CEO
014881,1996,chmn. & CEO
028524,1996,chmn. & CEO
002812,1996,chmn. & CEO
013468,1996,chmn. & CEO
008596,1996,pres. & CEO
010884,1997,chmn. & CEO
014881,1997,chmn. & CEO
028524,1997,CEO
002812,1997,chmn. & CEO
013468,1997,CEO
008596,1997,"chmn, pres. & CEO"
010884,1998,chmn. & CEO
028524,1998,chmn. & CEO
002812,1998,chmn. & CEO
008596,1998,"chmn, pres. & CEO"
010884,1999,chmn. & CEO
028524,1999,chmn. & CEO
002812,1999,chmn. & CEO
010884,2000,chairman & CEO
028524,2000,CEO
002812,2000,chairman & CEO
140760,2000,president & CEO
010884,2001,chairman & CEO
028524,2001,vice chairman & CEO
002812,2001,chairman & CEO
140760,2001,president & CEO
010884,2002,"chairman, president & CEO"
028524,2002,CEO
002812,2002,chairman & CEO
140760,2002,president & CEO
010884,2003,"chairman, president & CEO"
028524,2003,president & CEO
140760,2003,president & CEO
010884,2004,"chairman, president & CEO"
028524,2004,president & CEO
002812,2004,chairman & CEO
140760,2004,"chairman, president & CEO"
010884,2005,"chairman, president & CEO"
028524,2005,president & CEO
002812,2005,chairman & CEO
140760,2005,"chairman, president & CEO"
010884,2006,"chairman, president & CEO"
028524,2006,president & CEO
002812,2006,chairman & CEO
140760,2006,"chairman, president & CEO"
010884,2007,"chairman, president & CEO"
002812,2007,president & CEO
140760,2007,"chairman, president & CEO"
010884,2008,"chairman, president & CEO"
002812,2008,president & CEO
030443,2008,
140760,2008,"chairman, president & CEO"
010884,2009,"Chairman, Chief Executive Officer and President"
002812,2009,"Chief Executive Officer, President, Director and Member of Corporate Compensation & Benefits Committee"
030443,2009,Chairman and Chief Executive Officer
140760,2009,"chairman, president & CEO"
010884,2010,"Chairman, Chief Executive Officer and President"
002812,2010,"Chief Executive Officer, President, Director and Member of Executive Committee"
030443,2010,Chairman and Chief Executive Officer
140760,2010,"Chairman, Chief Executive Officer, President and Chairman of Executive Committee"
010884,2011,"Chairman, Chief Executive Officer and President"
002812,2011,"Chief Executive Officer, President, Director and Member of Executive Committee"
030443,2011,Chairman and Chief Executive Officer
140760,2011,"Chairman, Chief Executive Officer, President and Chairman of Executive Committee"
010884,2012,Consultant
002812,2012,Director and Member of Executive Committee
030443,2012,Chairman and Chief Executive Officer
140760,2012,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
010884,2013,"Chief Executive Officer, President and Director"
002812,2013,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
030443,2013,Chairman of the Board and Chief Executive Officer
140760,2013,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
010884,2014,"Chief Executive Officer, President and Director"
002812,2014,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
030443,2014,Chairman of the Board and Chief Executive Officer
140760,2014,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
030443,2015,Chairman and Chief Executive Officer
140760,2015,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
002812,2016,Chairman and Chief Executive Officer
030443,2016,Chairman and Chief Executive Officer
140760,2016,"Executive Chairman, Chief Executive Officer and Chairman of Executive Committee"
;;;;
data TEST;
infile datalines dsd truncover;
input id:$6. YEAR TITLEANN:$250.;
label id="Company ID Number" YEAR="Fiscal Year" TITLEANN="Annual Title";
if prxmatch('/\b(chmn|chairman)\b/i',TITLEANN) then is_chair=1;
if prxmatch('/\b(CEO|Chief Executive Officer)\b/i',TITLEANN) then is_ceo=1;
ceo_chair=ifn(is_chair and is_ceo ,1,0);
datalines4;
008596,1992,chmn. & pres.
010884,1993,chmn. & CEO
014881,1993,chmn. & CEO
002812,1993,chmn. & CEO
008596,1993,chmn. & pres.
012349,1994,chmn. & CEO
010884,1994,chmn. & CEO
014881,1994,chmn. & CEO
002812,1994,chmn. & CEO
013468,1994,CEO
008596,1994,chmn. & pres.
012349,1995,chmn. & CEO
010884,1995,chmn. & CEO
014881,1995,chmn. & CEO
002812,1995,chmn. & CEO
013468,1995,chmn. & CEO
008596,1995,chmn. & CEO
012349,1996,v-p & acting CEO
010884,1996,chmn. & CEO
014881,1996,chmn. & CEO
028524,1996,chmn. & CEO
002812,1996,chmn. & CEO
013468,1996,chmn. & CEO
008596,1996,pres. & CEO
010884,1997,chmn. & CEO
014881,1997,chmn. & CEO
028524,1997,CEO
002812,1997,chmn. & CEO
013468,1997,CEO
008596,1997,"chmn, pres. & CEO"
010884,1998,chmn. & CEO
028524,1998,chmn. & CEO
002812,1998,chmn. & CEO
008596,1998,"chmn, pres. & CEO"
010884,1999,chmn. & CEO
028524,1999,chmn. & CEO
002812,1999,chmn. & CEO
010884,2000,chairman & CEO
028524,2000,CEO
002812,2000,chairman & CEO
140760,2000,president & CEO
010884,2001,chairman & CEO
028524,2001,vice chairman & CEO
002812,2001,chairman & CEO
140760,2001,president & CEO
010884,2002,"chairman, president & CEO"
028524,2002,CEO
002812,2002,chairman & CEO
140760,2002,president & CEO
010884,2003,"chairman, president & CEO"
028524,2003,president & CEO
140760,2003,president & CEO
010884,2004,"chairman, president & CEO"
028524,2004,president & CEO
002812,2004,chairman & CEO
140760,2004,"chairman, president & CEO"
010884,2005,"chairman, president & CEO"
028524,2005,president & CEO
002812,2005,chairman & CEO
140760,2005,"chairman, president & CEO"
010884,2006,"chairman, president & CEO"
028524,2006,president & CEO
002812,2006,chairman & CEO
140760,2006,"chairman, president & CEO"
010884,2007,"chairman, president & CEO"
002812,2007,president & CEO
140760,2007,"chairman, president & CEO"
010884,2008,"chairman, president & CEO"
002812,2008,president & CEO
030443,2008,
140760,2008,"chairman, president & CEO"
010884,2009,"Chairman, Chief Executive Officer and President"
002812,2009,"Chief Executive Officer, President, Director and Member of Corporate Compensation & Benefits Committee"
030443,2009,Chairman and Chief Executive Officer
140760,2009,"chairman, president & CEO"
010884,2010,"Chairman, Chief Executive Officer and President"
002812,2010,"Chief Executive Officer, President, Director and Member of Executive Committee"
030443,2010,Chairman and Chief Executive Officer
140760,2010,"Chairman, Chief Executive Officer, President and Chairman of Executive Committee"
010884,2011,"Chairman, Chief Executive Officer and President"
002812,2011,"Chief Executive Officer, President, Director and Member of Executive Committee"
030443,2011,Chairman and Chief Executive Officer
140760,2011,"Chairman, Chief Executive Officer, President and Chairman of Executive Committee"
010884,2012,Consultant
002812,2012,Director and Member of Executive Committee
030443,2012,Chairman and Chief Executive Officer
140760,2012,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
010884,2013,"Chief Executive Officer, President and Director"
002812,2013,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
030443,2013,Chairman of the Board and Chief Executive Officer
140760,2013,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
010884,2014,"Chief Executive Officer, President and Director"
002812,2014,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
030443,2014,Chairman of the Board and Chief Executive Officer
140760,2014,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
030443,2015,Chairman and Chief Executive Officer
140760,2015,"Chairman, Chief Executive Officer and Chairman of Executive Committee"
002812,2016,Chairman and Chief Executive Officer
030443,2016,Chairman and Chief Executive Officer
140760,2016,"Executive Chairman, Chief Executive Officer and Chairman of Executive Committee"
;;;;
Thanks Ksharp this works partially. However what if I am trying to find the exact matching word. For Example now if we have "vice chairman" and "chairman" both will be flagged as is_chair=1. Is there a way to be more precise?
Try
if prxmatch('/\S\b(chmn|chairman)\b/i',TITLEANN) then is_chair=1;
or
if prxmatch('/[",](chmn|chairman)\b/i',TITLEANN) then is_chair=1;
To get both flags, Maybe:
if prxmatch('/\b(vice\W?ch(mn|airman)\b/i',TITLEANN) then is_vicechair=1;
else if prxmatch('/\b(chmn|chairman)\b/ i',TITLEANN) then is_chair =1;
That is a big challenge . You need list all the words .
i.e.
if prxmatch('/\b(Vice Chairman|Vice-Chairman)\b/i') then is_chair=0 ;
Hi @Agent1592
In cases like this, I would use another approach, the use of a lookup list instead of using algoritms to identify CEO's and Chairmen. It is a little more complicated and maybe overkill in your case, but I normally work with production code, where data quality is a major concern. The reason is that
So here we go. Input is copied from your post and not repeated here.
First part is to create a distinct list of titles. A data step is used to split the title string in individual titles with one opuput record per title.:
data test2 (drop=i); set test;
length w $250 title $100;
w = tranwrd(upcase(tranwrd(titleann,' and ',',')),'&',',');;
do i = 1 to count(w,',')+1;
title = left(upcase(scan(w,i,',')));
output;
end;
run;
proc sql;
create table titles as
select distinct title
from test2;
quit;
data _null_; set titles;
put '0 ' title;
run;
the result is a list written to the log, 5 out of 19 lines shown:
0 ACTING CEO
0 BENEFITS COMMITTEE
0 CEO
0 CHAIRMAN
0 CHAIRMAN OF EXECUTIVE COMMITTEE
Then I copy the list from the LOG window to data step code in the Program Editor and change code values from "0" to "1" (CEO) or "2" (Chairman) where appropriate. The result is this code, which should wtite to a permanent data set in production code.
data titlevalue;
infile cards truncover;
input value title $100.;
cards;
0
0 ACTING CEO
0 BENEFITS COMMITTEE
1 CEO
2 CHAIRMAN
2 CHAIRMAN OF EXECUTIVE COMMITTEE
2 CHAIRMAN OF THE BOARD
1 CHIEF EXECUTIVE OFFICER
2 CHMN
2 CHMN.
0 CONSULTANT
0 DIRECTOR
2 EXECUTIVE CHAIRMAN
0 MEMBER OF CORPORATE COMPENSATION
0 MEMBER OF EXECUTIVE COMMITTEE
0 PRES.
0 PRESIDENT
0 V-P
0 VICE CHAIRMAN
;
run;
Now we are ready to use the list as a lookup table. The following code reads the original input table and is a veriant of the code used above to split the titles, but here it doesn't split observations, byt identifies CEO and chairmen AND reports titles not accounted for. Note that I added an extra line to the input before running to show the handling of unknown titles:
data ceochair (keep=id year titleann ceo_chair);
length w $250 title $100 value 8;
if _N_ = 1 then do;
declare hash h(dataset:'work.titlevalue');
h.defineKey('title');
h.defineData('value');
h.defineDone();
call missing(value);
end;
set test;
w = tranwrd(upcase(tranwrd(titleann,' and ',',')),'&',',');;
do i = 1 to count(w,',')+1;
title = left(upcase(scan(w,i,',')));
rc=h.find();
if value = 2 then ceo = 1;
if value = 1 then chair = 1;
if value = . then put 'INFO: Unknown title: ' id= year= title=;
end;
if ceo = 1 and chair = 1 then ceo_chair = 1;
run;
The log shows the result:
Part of the result data set - actually the coding might be inaccurate, because I made errors in the the lookup table, but the point is that it follows the rules set in the table, not in an error-prone algoritm:
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.