Given Data
GVKEY | fyear | EXECID | TITLEANN | AGE |
002538 | 2007 | 02158 | president & CEO | 56 |
002538 | 2008 | 02158 | president & CEO | 57 |
002538 | 2009 | 02158 | 58 | |
002538 | 2010 | 02158 | 59 | |
002538 | 2011 | 42441 | 57 | |
002538 | 2012 | 42441 | 58 | |
002538 | 2013 | 42441 | 59 | |
002538 | 2014 | 42439 | Chief Financial Officer and Executive Vice President | 63 |
002538 | 2015 | 30776 | 50 | |
002538 | 2016 | 30776 | 51 | |
002554 | 1993 | 00222 | chmn. & CEO | |
002554 | 1994 | 00222 | chmn. & CEO | 59 |
002554 | 1995 | 00222 | chmn. & CEO | 60 |
002554 | 1996 | 00222 | chmn. & CEO | |
002554 | 1997 | 00222 | chmn. & CEO | 62 |
002554 | 1998 | 00222 | chmn. & CEO | |
002555 | 1992 | 05687 | ||
002555 | 1993 | 05687 | chmn. & CEO | 62 |
002555 | 1994 | 05687 | chmn. & CEO | 63 |
002555 | 1995 | 05687 | chmn. & CEO | 64 |
002555 | 1996 | 05687 | chmn. & CEO | 65 |
002555 | 1997 | 05687 | chmn. & CEO | 66 |
002555 | 1998 | 05687 | chmn. & CEO | 67 |
002555 | 1999 | 05687 | chmn. & CEO | 68 |
002555 | 2000 | 05687 | chairman & CEO | 69 |
002555 | 2001 | 23867 | chairman & CEO | 55 |
002555 | 2002 | 23867 | chairman, president & CEO | 56 |
002555 | 2003 | 23867 | chairman, president & CEO | |
002555 | 2005 | 28398 | chairman | 37 |
002555 | 2006 | 30037 | former CEO | |
002555 | 2007 | 30038 | CEO | 47 |
002555 | 2008 | 30038 | president & CEO | 48 |
002555 | 2009 | 30038 | 49 | |
002555 | 2010 | 30038 | 50 | |
002555 | 2011 | 30038 | 51 | |
002562 | 1992 | 00263 | chmn., pres. & CEO | |
002562 | 1993 | 00263 | chmn., pres. & CEO | 54 |
002562 | 1994 | 00263 | chmn., pres. & CEO | 55 |
I want the following results:
So, I want to fill the Titleann and age column. Some of the missing observations will be filled based on previously given information(yellow colored one) and some of the observations will be filled based on the later information(green colored one). And It is not possible to fill some of the missing values as there is no previous and after values are given for those. Missing values will be filled based on "EXECID" Also, GVKEY and FYEAR are important because the same executive may serve in different companies in different year. Can anyone please help me by giving the code?
Use a double DO loop to determine the necessary values per execid. The dataset has to be sorted by gvkey and fyear.
data have;
infile datalines dlm=";" dsd truncover;
input gvkey $ fyear execid $ titleann :$50. age;
datalines4;
002538;2007;02158;president & CEO;56
002538;2008;02158;president & CEO;57
002538;2009;02158;;58
002538;2010;02158;;59
002538;2011;42441;;57
002538;2012;42441;;58
002538;2013;42441;;59
002538;2014;42439;Chief Financial Officer and Executive Vice President;63
002538;2015;30776;;50
002538;2016;30776;;51
002554;1993;00222;chmn. & CEO;
002554;1994;00222;chmn. & CEO;59
002554;1995;00222;chmn. & CEO;60
002554;1996;00222;chmn. & CEO;
002554;1997;00222;chmn. & CEO;62
002554;1998;00222;chmn. & CEO;
002555;1992;05687;;
002555;1993;05687;chmn. & CEO;62
002555;1994;05687;chmn. & CEO;63
002555;1995;05687;chmn. & CEO;64
002555;1996;05687;chmn. & CEO;65
002555;1997;05687;chmn. & CEO;66
002555;1998;05687;chmn. & CEO;67
002555;1999;05687;chmn. & CEO;68
002555;2000;05687;chairman & CEO;69
002555;2001;23867;chairman & CEO;55
002555;2002;23867;chairman, president & CEO;56
002555;2003;23867;chairman, president & CEO;
002555;2005;28398;chairman;37
002555;2006;30037;former CEO;
002555;2007;30038;CEO;47
002555;2008;30038;president & CEO;48
002555;2009;30038;;49
002555;2010;30038;;50
002555;2011;30038;;51
002562;1992;00263;chmn., pres. & CEO;
002562;1993;00263;chmn., pres. & CEO;54
002562;1994;00263;chmn., pres. & CEO;55
;;;;
data want;
do until (last.execid);
set have;
by gvkey execid notsorted;
if age ne . then byear = fyear - age;
if titleann ne "" then _titleann = titleann;
end;
do until (last.execid);
set have;
by gvkey execid notsorted;
titleann = coalescec(titleann,_titleann);
if byear ne . then age = fyear - byear;
output;
end;
drop byear _titleann;
run;
Note how I created the example dataset with a working (no ERRORs, WARNINGs or extraneous NOTEs) data step with datalines; please provide data in this form in the future, as it makes it clear what is contained in your data with respect to variable attributes and contents.
This is how logs from clean steps look:
73 data have; 74 infile datalines dlm=";" dsd truncover; 75 input gvkey $ fyear execid $ titleann :$50. age; 76 datalines4; NOTE: The data set WORK.HAVE has 38 observations and 5 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds 115 ;;;; 116 117 data want; 118 do until (last.execid); 119 set have; 120 by gvkey execid notsorted; 121 if age ne . then byear = fyear - age; 122 if titleann ne "" then _titleann = titleann; 123 end; 124 do until (last.execid); 125 set have; 126 by gvkey execid notsorted; 127 titleann = coalescec(titleann,_titleann); 128 if byear ne . then age = fyear - byear; 129 output; 130 end; 131 drop byear _titleann; 132 run; NOTE: There were 38 observations read from the data set WORK.HAVE. NOTE: There were 38 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 38 observations and 5 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.01 seconds
Use a double DO loop to determine the necessary values per execid. The dataset has to be sorted by gvkey and fyear.
data have;
infile datalines dlm=";" dsd truncover;
input gvkey $ fyear execid $ titleann :$50. age;
datalines4;
002538;2007;02158;president & CEO;56
002538;2008;02158;president & CEO;57
002538;2009;02158;;58
002538;2010;02158;;59
002538;2011;42441;;57
002538;2012;42441;;58
002538;2013;42441;;59
002538;2014;42439;Chief Financial Officer and Executive Vice President;63
002538;2015;30776;;50
002538;2016;30776;;51
002554;1993;00222;chmn. & CEO;
002554;1994;00222;chmn. & CEO;59
002554;1995;00222;chmn. & CEO;60
002554;1996;00222;chmn. & CEO;
002554;1997;00222;chmn. & CEO;62
002554;1998;00222;chmn. & CEO;
002555;1992;05687;;
002555;1993;05687;chmn. & CEO;62
002555;1994;05687;chmn. & CEO;63
002555;1995;05687;chmn. & CEO;64
002555;1996;05687;chmn. & CEO;65
002555;1997;05687;chmn. & CEO;66
002555;1998;05687;chmn. & CEO;67
002555;1999;05687;chmn. & CEO;68
002555;2000;05687;chairman & CEO;69
002555;2001;23867;chairman & CEO;55
002555;2002;23867;chairman, president & CEO;56
002555;2003;23867;chairman, president & CEO;
002555;2005;28398;chairman;37
002555;2006;30037;former CEO;
002555;2007;30038;CEO;47
002555;2008;30038;president & CEO;48
002555;2009;30038;;49
002555;2010;30038;;50
002555;2011;30038;;51
002562;1992;00263;chmn., pres. & CEO;
002562;1993;00263;chmn., pres. & CEO;54
002562;1994;00263;chmn., pres. & CEO;55
;;;;
data want;
do until (last.execid);
set have;
by gvkey execid notsorted;
if age ne . then byear = fyear - age;
if titleann ne "" then _titleann = titleann;
end;
do until (last.execid);
set have;
by gvkey execid notsorted;
titleann = coalescec(titleann,_titleann);
if byear ne . then age = fyear - byear;
output;
end;
drop byear _titleann;
run;
Note how I created the example dataset with a working (no ERRORs, WARNINGs or extraneous NOTEs) data step with datalines; please provide data in this form in the future, as it makes it clear what is contained in your data with respect to variable attributes and contents.
This is how logs from clean steps look:
73 data have; 74 infile datalines dlm=";" dsd truncover; 75 input gvkey $ fyear execid $ titleann :$50. age; 76 datalines4; NOTE: The data set WORK.HAVE has 38 observations and 5 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds 115 ;;;; 116 117 data want; 118 do until (last.execid); 119 set have; 120 by gvkey execid notsorted; 121 if age ne . then byear = fyear - age; 122 if titleann ne "" then _titleann = titleann; 123 end; 124 do until (last.execid); 125 set have; 126 by gvkey execid notsorted; 127 titleann = coalescec(titleann,_titleann); 128 if byear ne . then age = fyear - byear; 129 output; 130 end; 131 drop byear _titleann; 132 run; NOTE: There were 38 observations read from the data set WORK.HAVE. NOTE: There were 38 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 38 observations and 5 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.01 seconds
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.