BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abdulla
Pyrite | Level 9

Given Data

GVKEYfyearEXECIDTITLEANNAGE
002538200702158president & CEO56
002538200802158president & CEO57
002538200902158 58
002538201002158 59
002538201142441 57
002538201242441 58
002538201342441 59
002538201442439Chief Financial Officer and Executive Vice President63
002538201530776 50
002538201630776 51
002554199300222chmn. & CEO 
002554199400222chmn. & CEO59
002554199500222chmn. & CEO60
002554199600222chmn. & CEO 
002554199700222chmn. & CEO62
002554199800222chmn. & CEO 
002555199205687  
002555199305687chmn. & CEO62
002555199405687chmn. & CEO63
002555199505687chmn. & CEO64
002555199605687chmn. & CEO65
002555199705687chmn. & CEO66
002555199805687chmn. & CEO67
002555199905687chmn. & CEO68
002555200005687chairman & CEO69
002555200123867chairman & CEO55
002555200223867chairman, president & CEO56
002555200323867chairman, president & CEO 
002555200528398chairman37
002555200630037former CEO 
002555200730038CEO47
002555200830038president & CEO48
002555200930038 49
002555201030038 50
002555201130038 51
002562199200263chmn., pres. & CEO 
002562199300263chmn., pres. & CEO54
002562199400263chmn., pres. & CEO55


I want the following results:

Capture1.PNG

So, I want to fill the Titleann and age column. Some of the missing observations will be filled based on previously given information(yellow colored one) and some of the observations will be filled based on the later information(green colored one). And It is not possible to fill some of the missing values as there is no previous and after values are given for those. Missing values will be filled based on "EXECID" Also, GVKEY and FYEAR are important because the same executive may serve in different companies in different year. Can anyone please help me by giving the code?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Use a double DO loop to determine the necessary values per execid. The dataset has to be sorted by gvkey and fyear.

data have;
infile datalines dlm=";" dsd truncover;
input gvkey $ fyear execid $ titleann :$50. age;
datalines4;
002538;2007;02158;president & CEO;56
002538;2008;02158;president & CEO;57
002538;2009;02158;;58
002538;2010;02158;;59
002538;2011;42441;;57
002538;2012;42441;;58
002538;2013;42441;;59
002538;2014;42439;Chief Financial Officer and Executive Vice President;63
002538;2015;30776;;50
002538;2016;30776;;51
002554;1993;00222;chmn. & CEO;  
002554;1994;00222;chmn. & CEO;59
002554;1995;00222;chmn. & CEO;60
002554;1996;00222;chmn. & CEO; 
002554;1997;00222;chmn. & CEO;62
002554;1998;00222;chmn. & CEO; 
002555;1992;05687;;   
002555;1993;05687;chmn. & CEO;62
002555;1994;05687;chmn. & CEO;63
002555;1995;05687;chmn. & CEO;64
002555;1996;05687;chmn. & CEO;65
002555;1997;05687;chmn. & CEO;66
002555;1998;05687;chmn. & CEO;67
002555;1999;05687;chmn. & CEO;68
002555;2000;05687;chairman & CEO;69
002555;2001;23867;chairman & CEO;55
002555;2002;23867;chairman, president & CEO;56
002555;2003;23867;chairman, president & CEO;  
002555;2005;28398;chairman;37
002555;2006;30037;former CEO;   
002555;2007;30038;CEO;47
002555;2008;30038;president & CEO;48
002555;2009;30038;;49
002555;2010;30038;;50
002555;2011;30038;;51
002562;1992;00263;chmn., pres. & CEO; 
002562;1993;00263;chmn., pres. & CEO;54
002562;1994;00263;chmn., pres. & CEO;55
;;;;

data want;
do until (last.execid);
  set have;
  by gvkey execid notsorted;
  if age ne . then byear = fyear - age;
  if titleann ne "" then _titleann = titleann;
end;
do until (last.execid);
  set have;
  by gvkey execid notsorted;
  titleann = coalescec(titleann,_titleann);
  if byear ne . then age = fyear - byear;
  output;
end;
drop byear _titleann;
run;

Note how I created the example dataset with a working (no ERRORs, WARNINGs or extraneous NOTEs) data step with datalines; please provide data in this form in the future, as it makes it clear what is contained in your data with respect to variable attributes and contents.

This is how logs from clean steps look:

 73         data have;
 74         infile datalines dlm=";" dsd truncover;
 75         input gvkey $ fyear execid $ titleann :$50. age;
 76         datalines4;
 
 NOTE: The data set WORK.HAVE has 38 observations and 5 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 
 115        ;;;;
 116        
 117        data want;
 118        do until (last.execid);
 119          set have;
 120          by gvkey execid notsorted;
 121          if age ne . then byear = fyear - age;
 122          if titleann ne "" then _titleann = titleann;
 123        end;
 124        do until (last.execid);
 125          set have;
 126          by gvkey execid notsorted;
 127          titleann = coalescec(titleann,_titleann);
 128          if byear ne . then age = fyear - byear;
 129          output;
 130        end;
 131        drop byear _titleann;
 132        run;
 
 NOTE: There were 38 observations read from the data set WORK.HAVE.
 NOTE: There were 38 observations read from the data set WORK.HAVE.
 NOTE: The data set WORK.WANT has 38 observations and 5 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds

View solution in original post

1 REPLY 1
Kurt_Bremser
Super User

Use a double DO loop to determine the necessary values per execid. The dataset has to be sorted by gvkey and fyear.

data have;
infile datalines dlm=";" dsd truncover;
input gvkey $ fyear execid $ titleann :$50. age;
datalines4;
002538;2007;02158;president & CEO;56
002538;2008;02158;president & CEO;57
002538;2009;02158;;58
002538;2010;02158;;59
002538;2011;42441;;57
002538;2012;42441;;58
002538;2013;42441;;59
002538;2014;42439;Chief Financial Officer and Executive Vice President;63
002538;2015;30776;;50
002538;2016;30776;;51
002554;1993;00222;chmn. & CEO;  
002554;1994;00222;chmn. & CEO;59
002554;1995;00222;chmn. & CEO;60
002554;1996;00222;chmn. & CEO; 
002554;1997;00222;chmn. & CEO;62
002554;1998;00222;chmn. & CEO; 
002555;1992;05687;;   
002555;1993;05687;chmn. & CEO;62
002555;1994;05687;chmn. & CEO;63
002555;1995;05687;chmn. & CEO;64
002555;1996;05687;chmn. & CEO;65
002555;1997;05687;chmn. & CEO;66
002555;1998;05687;chmn. & CEO;67
002555;1999;05687;chmn. & CEO;68
002555;2000;05687;chairman & CEO;69
002555;2001;23867;chairman & CEO;55
002555;2002;23867;chairman, president & CEO;56
002555;2003;23867;chairman, president & CEO;  
002555;2005;28398;chairman;37
002555;2006;30037;former CEO;   
002555;2007;30038;CEO;47
002555;2008;30038;president & CEO;48
002555;2009;30038;;49
002555;2010;30038;;50
002555;2011;30038;;51
002562;1992;00263;chmn., pres. & CEO; 
002562;1993;00263;chmn., pres. & CEO;54
002562;1994;00263;chmn., pres. & CEO;55
;;;;

data want;
do until (last.execid);
  set have;
  by gvkey execid notsorted;
  if age ne . then byear = fyear - age;
  if titleann ne "" then _titleann = titleann;
end;
do until (last.execid);
  set have;
  by gvkey execid notsorted;
  titleann = coalescec(titleann,_titleann);
  if byear ne . then age = fyear - byear;
  output;
end;
drop byear _titleann;
run;

Note how I created the example dataset with a working (no ERRORs, WARNINGs or extraneous NOTEs) data step with datalines; please provide data in this form in the future, as it makes it clear what is contained in your data with respect to variable attributes and contents.

This is how logs from clean steps look:

 73         data have;
 74         infile datalines dlm=";" dsd truncover;
 75         input gvkey $ fyear execid $ titleann :$50. age;
 76         datalines4;
 
 NOTE: The data set WORK.HAVE has 38 observations and 5 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 
 115        ;;;;
 116        
 117        data want;
 118        do until (last.execid);
 119          set have;
 120          by gvkey execid notsorted;
 121          if age ne . then byear = fyear - age;
 122          if titleann ne "" then _titleann = titleann;
 123        end;
 124        do until (last.execid);
 125          set have;
 126          by gvkey execid notsorted;
 127          titleann = coalescec(titleann,_titleann);
 128          if byear ne . then age = fyear - byear;
 129          output;
 130        end;
 131        drop byear _titleann;
 132        run;
 
 NOTE: There were 38 observations read from the data set WORK.HAVE.
 NOTE: There were 38 observations read from the data set WORK.HAVE.
 NOTE: The data set WORK.WANT has 38 observations and 5 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 424 views
  • 1 like
  • 2 in conversation