Hi All,
I am wanting to keep only observations where this is both a CEO and a CFO for every GVKEYYEAR.
I HAVE the following:
Name | CEOANN | CFOANN | GVKEYYEAR |
Batman | CEO | 2008185 | |
Robin | CFO | 2008185 | |
Mickey | CEO | 2001053 | |
Mikey | CFO | 2088087 | |
Leo | CEO | 2088087 | |
Goofy | CFO | 2109087 | |
Superman | CEO | 2089171 | |
Peanut Butter | CEO | 1089721 | |
Jelly | CFO | 1089721 |
What I WANT is the following:
Name | CEOANN | CFOANN | GVKEYYEAR |
Batman | CEO | 2008185 | |
Robin | CFO | 2008185 | |
Mikey | CFO | 2088087 | |
Leo | CEO | 2088087 | |
Peanut Butter | CEO | 1089721 | |
Jelly | CFO | 1089721 |
What code may help me do this?
Thank you!
data have;
infile cards dlm="|";
input Name :$32. (CEOANN CFOANN) (:$3.) GVKEYYEAR;
cards;
Batman|CEO| |2008185
Robin| |CFO|2008185
Mickey|CEO| |2001053
Mikey| |CFO|2088087
Leo|CEO| |2088087
Goofy| |CFO|2109087
Superman|CEO| |2089171
Peanut Butter|CEO| |1089721
Jelly| |CFO|1089721
;
run;
proc print;
run;
data want;
call missing(check1,check2);
do _N_=1 by 1 until(last.GVKEYYEAR);
set have;
by GVKEYYEAR notsorted;
check1+(CEOANN='CEO');
check2+(CFOANN='CFO');
end;
do _N_=1 to _N_;
set have;
if check1 & check2 then output;
end;
drop check1 check2;
run;
proc print;
run;
This works for example data provided
proc sql; create table want as select b.* from ( select gvkeyyear from have where not missing(ceoann) or not missing(cfoAnn) group gvkeyyear having count(gvkeyyear)=2 ) as a left join have as b on a.gvkeyyear=b.gvkeyyear ; quit;
The Where may not be needed but might if you have other observations in your data for the gvkeyyear related to other data not shown in your example.
If you built that set to have two separate variables for Ceoann and CfoAnn you might consider a single variable with the position value, such as CEO and CFO. That is quite often more flexible in the long run than adding a bunch of variables for each level of meaning.
Your data appears to be already presented either as a single record for a given GVKEYYEAR, or a pair of records for a GVKEYYEAR. In the latter case, one CEO appears and one CFO appears. If that is the case, then just delete all the singletons and keep all the pairs.
data have;
infile cards dlm="|";
input Name :$32. (CEOANN CFOANN) (:$3.) GVKEYYEAR;
cards;
Batman|CEO| |2008185
Robin| |CFO|2008185
Mickey|CEO| |2001053
Mikey| |CFO|2088087
Leo|CEO| |2088087
Goofy| |CFO|2109087
Superman|CEO| |2089171
Peanut Butter|CEO| |1089721
Jelly| |CFO|1089721
run;
data want;
set have ;
by gvkeyyear notsorted;
if not(first.gvkeyyear=1 and last.gvkeyyear=1);
run;
data have;
infile cards dlm="|";
input Name :$32. (CEOANN CFOANN) (:$3.) GVKEYYEAR;
cards;
Batman|CEO| |2008185
Robin| |CFO|2008185
Mickey|CEO| |2001053
Mikey| |CFO|2088087
Leo|CEO| |2088087
Goofy| |CFO|2109087
Superman|CEO| |2089171
Peanut Butter|CEO| |1089721
Jelly| |CFO|1089721
;
run;
proc sql;
create table want as
select * from have
group by GVKEYYEAR
having sum(CEOANN='CEO') and sum(CFOANN='CFO');
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.