Good days to all,
I attached a dataset with company identification code (ticker), directors identification code (iddir), data year (year) and directors' gender (dirgender).
I have to calculate a variable - the fraction of male directors on the board who sit on other boards on which there are female directors for a given year.
For example, let say company X have 14 directors in total in year 2010, 10 of them are males and 4 of them are females. 3 out of 10 male directors sit on the boards in other companies. 2 out of 3 male directors who sit on other boards with female directors (ie. one of 3 males directors who sit on other boards in other company without the presence of female directors).
Based on the case above:
The fraction of male directors on the board who sit on other boards on which there are female directors for Year 2010 = 2/10 or 0.2.
Hope to get any answer which relevant for the calculation.
Thank you.
mspak
what about the missing values for iddir?
Hi,
Your task can't be finished in one step (not by me at least). Following code has not been intensively tested:
/*purge records with missing data*/
data have;
set h.connected_male;
if cmiss(ticker,dirgender)=0;
if nmiss(iddir,year)=0;
run;
proc sql;
/*Mark the ticker-year group with female on board*/
create table want1 as
select distinct *, SUM(upcase(dirgender)='FEMALE')>0 AS _FEMALE FROM have
GROUP BY TICKER,YEAR
;
/*mark the iddir who holds multiple positions in different companies at the same period*/
create table want2 as
select distinct *, count(distinct ticker)>1 as _mult from want1
group by iddir, year;
/*count the positions per iddir-year */
create table want3 as
select distinct a.*,b._mult from want1 a, want2 b where
a.ticker=b.ticker
and a.year=b.year
and a.iddir=b.iddir
;
create table want4 as
select distinct *,sum(_female=1 and _mult=1) as _ct from want3
group by iddir, year;
quit;
/*Mark the iddir of interest*/
data want5;
set want4;
if ((_mult=0 and _ct>0) or (_mult=1 and _ct>1)) and upcase(dirgender)='MALE' then flag=1;
else flag=0;
run;
/*Calculated the percentage and sorted by descending percentage*/
proc sql;
create table want6 as
select ticker,year, sum(flag)/sum(upcase(dirgender)='MALE') as percent format=percent7.2 from want5
group by ticker, year
order by percent desc, ticker,year;
quit;
Regards,
Haikuo
@Haikuo
Did you consider about the duplicates?
I mean
gender company id year
male ENR 27719 2009
male ENR 27719 2009
In my code, duplicates are removed by using 'distinct' option in SQL.
The duplicates and missing values are not be included.
mspak, assuming you can fix the missing iddir problem and that the abrupt jump in the number of companies in 2009 doesn't pose a problem, the following queries should proove useful:
proc sql;
/* Companies with females on the board */
create table femComp as
select distinct year, ticker from Connected_male where char(dirGender,1)="F";
/* Connecting directors : male directors sitting on the board of more than one
company with females on the board */
create table connDir as
select distinct C.year, C.iddir
from Connected_male as C inner join femComp as F
on C.year=F.year and C.ticker=F.ticker
where C.iddir is not missing and char(C.dirGender,1)="M"
group by C.year, C.iddir
having count(C.ticker) > 1;
/* Total count of connecting male directors per year */
create table connDirCount as
select year, count(iddir) as nConn from connDir group by year;
/* Total count of male directors per year*/
create table dirCount as
select year, count(distinct iddir) as nDir from Connected_male
where char(dirGender,1)="M" group by year;
/* Proportion of connecting directors to total male director count, every year */
select T.year, C.nConn label="Connecting male directors", T.nDir label="Total male directors",
coalesce(C.nConn, 0)/T.nDir as connProp format=percent10.1 label="Percent connecting"
from dirCount as T left join connDirCount as C on T.year=C.year;
PG
Thanks PG,
I expect an output that percent of male directors which also sit in other boards with female directors for each firm-year.
I expect the following (for example):
Ticker YEAR PERCENT_CONNECTED MALE:
AAA 2003 CERTAIN PERCENTAGE
AAA 2004 CERTAIN PERCENTAGE
AAA 2005 CERTAIN PERCENTAGE
AAA 2006 CERTAIN PERCENTAGE
AAA 2007 CERTAIN PERCENTAGE
AAA 2008 CERTAIN PERCENTAGE
AAA 2009 CERTAIN PERCENTAGE
AAA 2010 CERTAIN PERCENTAGE
BBB 2003
.......and so on.
Thank you
Then, it is even simpler:
proc sql;
/* Companies with females on the board */
create table femComp as
select distinct year, ticker from test.Connected_male where char(dirGender,1)="F";
/* Connecting directors : male directors sitting on the board of more than one
company with females on the board */
create table connDir as
select distinct C.year, C.iddir
from test.Connected_male as C inner join femComp as F
on C.year=F.year and C.ticker=F.ticker
where C.iddir is not missing and char(C.dirGender,1)="M"
group by C.year, C.iddir
having count(C.ticker) > 1;
/* Proportion of connecting directors in each board, every year */
Create table propConn as
select C.ticker, C.year,
count(D.iddir) / sum(char(C.dirGender,1)="M") as pctConn format=percent10.1 label="Percent connecting"
from test.Connected_male as C left join connDir as D on C.year=D.year and C.iddir=D.iddir
where ticker is not missing
group by C.ticker, C.year;
/* print the first 20 results */
proc print data=propConn(obs=20) label; run;
-------------------------------------------------------------------
Percent
Obs Ticker YEAR connecting
1 A 2001 0.0%
2 A 2002 .
3 A 2003 0.0%
4 A 2004 0.0%
5 A 2005 0.0%
6 A 2006 0.0%
7 A 2007 0.0%
8 A 2008 0.0%
9 A 2009 86.7%
10 A 2010 88.9%
11 AA 2001 0.0%
12 AA 2002 0.0%
13 AA 2003 0.0%
14 AA 2004 0.0%
15 AA 2005 0.0%
16 AA 2006 0.0%
17 AA 2007 0.0%
18 AA 2008 0.0%
19 AA 2009 81.0%
20 AA 2010 82.6%
----------------------------------------------------------
PG
How about:
libname x v9 'c:\'; %let company=A; %let year=2009; proc sort data=x.Connected_male(where=(ticker="&company" and year=&year and dirgender='Male')) out=have(keep=iddir) nodupkey; by iddir ; run; proc sort data=x.Connected_male out=dir(keep=iddir ticker rename=(iddir=_iddir) ) nodupkey; by iddir ticker ; run; proc sort data=x.Connected_male out=gender(keep=dirgender ticker rename=(ticker=_ticker)) nodupkey; by ticker dirgender ; run; data _null_; if _n_ eq 1 then do; if 0 then set dir ; declare hash dir(hashexp:10, dataset:'dir',multidata:'Y'); declare hiter _dir('dir'); dir.definekey('_iddir'); dir.definedata('_iddir','ticker'); dir.definedone(); if 0 then set gender; declare hash gender(hashexp:10,dataset:'gender',multidata:'Y') ; declare hiter _gender('gender'); gender.definekey('_ticker'); gender.definedata('_ticker','dirgender'); gender.definedone(); end; set have end=last; male+1; flag=0; do while(_dir.next()=0); if iddir eq _iddir and ticker ne "&company" then do; do while(_gender.next()=0); if ticker eq _ticker and dirgender eq: 'F' then flag=1; end; end; end; if flag=1 then yes+1; if last then do; percent= yes/male; putlog NOTE: yes= male= percent= ; end; run;
Ksharp
Thanks Ksharp,
If not mistaken, your program code is to generate a report, instead of a new data variable. I would like to claculate the percent of male connected (which defined as per the question) in each ticker (ie firm) for every year.
mspak
OK. Easy.
libname x v9 'c:\'; %macro percent(company= ,year= ); proc sort data=x.Connected_male(where=(ticker="&company" and year=&year and dirgender='Male')) out=have(keep=iddir) nodupkey; by iddir ; run; proc sort data=x.Connected_male out=dir(keep=iddir ticker rename=(iddir=_iddir) ) nodupkey; by iddir ticker ; run; proc sort data=x.Connected_male out=gender(keep=dirgender ticker rename=(ticker=_ticker)) nodupkey; by ticker dirgender ; run; data temp(keep=year company percent); length company $ 100; if _n_ eq 1 then do; if 0 then set dir ; declare hash dir(hashexp:10, dataset:'dir',multidata:'Y'); declare hiter _dir('dir'); dir.definekey('_iddir'); dir.definedata('_iddir','ticker'); dir.definedone(); if 0 then set gender; declare hash gender(hashexp:10,dataset:'gender',multidata:'Y') ; declare hiter _gender('gender'); gender.definekey('_ticker'); gender.definedata('_ticker','dirgender'); gender.definedone(); end; set have end=last; male+1; flag=0; do while(_dir.next()=0); if iddir eq _iddir and ticker ne "&company" then do; do while(_gender.next()=0); if ticker eq _ticker and dirgender eq: 'F' then flag=1; end; end; end; if flag=1 then yes+1; if last then do; company="&company"; year=&year; percent= yes/male; output ; end; run; proc append base=want data=temp ;run; %mend percent; proc sort data=x.Connected_male(keep=ticker year) out=index nodupkey; by ticker year ; run; data _null_; set index; call execute('%percent(company='||ticker||',year='||year||')'); run;
Ksharp
Thanks Ksharp,
I need time to understand the program. I know that you used the SAS Hash object function, but it is new to me. Will post an update.
Regards,
mspak
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.