Solved: Fill the cell

abdulla · Posted 11-13-2020 04:24 PM

Given Data

GVKEY	fyear	EXECID	TITLEANN	AGE
002538	2007	02158	president & CEO	56
002538	2008	02158	president & CEO	57
002538	2009	02158		58
002538	2010	02158		59
002538	2011	42441		57
002538	2012	42441		58
002538	2013	42441		59
002538	2014	42439	Chief Financial Officer and Executive Vice President	63
002538	2015	30776		50
002538	2016	30776		51
002554	1993	00222	chmn. & CEO
002554	1994	00222	chmn. & CEO	59
002554	1995	00222	chmn. & CEO	60
002554	1996	00222	chmn. & CEO
002554	1997	00222	chmn. & CEO	62
002554	1998	00222	chmn. & CEO
002555	1992	05687
002555	1993	05687	chmn. & CEO	62
002555	1994	05687	chmn. & CEO	63
002555	1995	05687	chmn. & CEO	64
002555	1996	05687	chmn. & CEO	65
002555	1997	05687	chmn. & CEO	66
002555	1998	05687	chmn. & CEO	67
002555	1999	05687	chmn. & CEO	68
002555	2000	05687	chairman & CEO	69
002555	2001	23867	chairman & CEO	55
002555	2002	23867	chairman, president & CEO	56
002555	2003	23867	chairman, president & CEO
002555	2005	28398	chairman	37
002555	2006	30037	former CEO
002555	2007	30038	CEO	47
002555	2008	30038	president & CEO	48
002555	2009	30038		49
002555	2010	30038		50
002555	2011	30038		51
002562	1992	00263	chmn., pres. & CEO
002562	1993	00263	chmn., pres. & CEO	54
002562	1994	00263	chmn., pres. & CEO	55

I want the following results:

So, I want to fill the Titleann and age column. Some of the missing observations will be filled based on previously given information(yellow colored one) and some of the observations will be filled based on the later information(green colored one). And It is not possible to fill some of the missing values as there is no previous and after values are given for those. Missing values will be filled based on "EXECID" Also, GVKEY and FYEAR are important because the same executive may serve in different companies in different year. Can anyone please help me by giving the code?

Kurt_Bremser · Posted 11-14-2020 02:47 AM

Use a double DO loop to determine the necessary values per execid. The dataset has to be sorted by gvkey and fyear.

data have;
infile datalines dlm=";" dsd truncover;
input gvkey $ fyear execid $ titleann :$50. age;
datalines4;
002538;2007;02158;president & CEO;56
002538;2008;02158;president & CEO;57
002538;2009;02158;;58
002538;2010;02158;;59
002538;2011;42441;;57
002538;2012;42441;;58
002538;2013;42441;;59
002538;2014;42439;Chief Financial Officer and Executive Vice President;63
002538;2015;30776;;50
002538;2016;30776;;51
002554;1993;00222;chmn. & CEO;  
002554;1994;00222;chmn. & CEO;59
002554;1995;00222;chmn. & CEO;60
002554;1996;00222;chmn. & CEO; 
002554;1997;00222;chmn. & CEO;62
002554;1998;00222;chmn. & CEO; 
002555;1992;05687;;   
002555;1993;05687;chmn. & CEO;62
002555;1994;05687;chmn. & CEO;63
002555;1995;05687;chmn. & CEO;64
002555;1996;05687;chmn. & CEO;65
002555;1997;05687;chmn. & CEO;66
002555;1998;05687;chmn. & CEO;67
002555;1999;05687;chmn. & CEO;68
002555;2000;05687;chairman & CEO;69
002555;2001;23867;chairman & CEO;55
002555;2002;23867;chairman, president & CEO;56
002555;2003;23867;chairman, president & CEO;  
002555;2005;28398;chairman;37
002555;2006;30037;former CEO;   
002555;2007;30038;CEO;47
002555;2008;30038;president & CEO;48
002555;2009;30038;;49
002555;2010;30038;;50
002555;2011;30038;;51
002562;1992;00263;chmn., pres. & CEO; 
002562;1993;00263;chmn., pres. & CEO;54
002562;1994;00263;chmn., pres. & CEO;55
;;;;

data want;
do until (last.execid);
  set have;
  by gvkey execid notsorted;
  if age ne . then byear = fyear - age;
  if titleann ne "" then _titleann = titleann;
end;
do until (last.execid);
  set have;
  by gvkey execid notsorted;
  titleann = coalescec(titleann,_titleann);
  if byear ne . then age = fyear - byear;
  output;
end;
drop byear _titleann;
run;

Note how I created the example dataset with a working (no ERRORs, WARNINGs or extraneous NOTEs) data step with datalines; please provide data in this form in the future, as it makes it clear what is contained in your data with respect to variable attributes and contents.

This is how logs from clean steps look:

 73         data have;
 74         infile datalines dlm=";" dsd truncover;
 75         input gvkey $ fyear execid $ titleann :$50. age;
 76         datalines4;
 
 NOTE: The data set WORK.HAVE has 38 observations and 5 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 
 115        ;;;;
 116        
 117        data want;
 118        do until (last.execid);
 119          set have;
 120          by gvkey execid notsorted;
 121          if age ne . then byear = fyear - age;
 122          if titleann ne "" then _titleann = titleann;
 123        end;
 124        do until (last.execid);
 125          set have;
 126          by gvkey execid notsorted;
 127          titleann = coalescec(titleann,_titleann);
 128          if byear ne . then age = fyear - byear;
 129          output;
 130        end;
 131        drop byear _titleann;
 132        run;
 
 NOTE: There were 38 observations read from the data set WORK.HAVE.
 NOTE: There were 38 observations read from the data set WORK.HAVE.
 NOTE: The data set WORK.WANT has 38 observations and 5 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 11-14-2020 02:47 AM

Use a double DO loop to determine the necessary values per execid. The dataset has to be sorted by gvkey and fyear.

data have;
infile datalines dlm=";" dsd truncover;
input gvkey $ fyear execid $ titleann :$50. age;
datalines4;
002538;2007;02158;president & CEO;56
002538;2008;02158;president & CEO;57
002538;2009;02158;;58
002538;2010;02158;;59
002538;2011;42441;;57
002538;2012;42441;;58
002538;2013;42441;;59
002538;2014;42439;Chief Financial Officer and Executive Vice President;63
002538;2015;30776;;50
002538;2016;30776;;51
002554;1993;00222;chmn. & CEO;  
002554;1994;00222;chmn. & CEO;59
002554;1995;00222;chmn. & CEO;60
002554;1996;00222;chmn. & CEO; 
002554;1997;00222;chmn. & CEO;62
002554;1998;00222;chmn. & CEO; 
002555;1992;05687;;   
002555;1993;05687;chmn. & CEO;62
002555;1994;05687;chmn. & CEO;63
002555;1995;05687;chmn. & CEO;64
002555;1996;05687;chmn. & CEO;65
002555;1997;05687;chmn. & CEO;66
002555;1998;05687;chmn. & CEO;67
002555;1999;05687;chmn. & CEO;68
002555;2000;05687;chairman & CEO;69
002555;2001;23867;chairman & CEO;55
002555;2002;23867;chairman, president & CEO;56
002555;2003;23867;chairman, president & CEO;  
002555;2005;28398;chairman;37
002555;2006;30037;former CEO;   
002555;2007;30038;CEO;47
002555;2008;30038;president & CEO;48
002555;2009;30038;;49
002555;2010;30038;;50
002555;2011;30038;;51
002562;1992;00263;chmn., pres. & CEO; 
002562;1993;00263;chmn., pres. & CEO;54
002562;1994;00263;chmn., pres. & CEO;55
;;;;

data want;
do until (last.execid);
  set have;
  by gvkey execid notsorted;
  if age ne . then byear = fyear - age;
  if titleann ne "" then _titleann = titleann;
end;
do until (last.execid);
  set have;
  by gvkey execid notsorted;
  titleann = coalescec(titleann,_titleann);
  if byear ne . then age = fyear - byear;
  output;
end;
drop byear _titleann;
run;

Note how I created the example dataset with a working (no ERRORs, WARNINGs or extraneous NOTEs) data step with datalines; please provide data in this form in the future, as it makes it clear what is contained in your data with respect to variable attributes and contents.

This is how logs from clean steps look:

 73         data have;
 74         infile datalines dlm=";" dsd truncover;
 75         input gvkey $ fyear execid $ titleann :$50. age;
 76         datalines4;
 
 NOTE: The data set WORK.HAVE has 38 observations and 5 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 
 115        ;;;;
 116        
 117        data want;
 118        do until (last.execid);
 119          set have;
 120          by gvkey execid notsorted;
 121          if age ne . then byear = fyear - age;
 122          if titleann ne "" then _titleann = titleann;
 123        end;
 124        do until (last.execid);
 125          set have;
 126          by gvkey execid notsorted;
 127          titleann = coalescec(titleann,_titleann);
 128          if byear ne . then age = fyear - byear;
 129          output;
 130        end;
 131        drop byear _titleann;
 132        run;
 
 NOTE: There were 38 observations read from the data set WORK.HAVE.
 NOTE: There were 38 observations read from the data set WORK.HAVE.
 NOTE: The data set WORK.WANT has 38 observations and 5 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Fill the cell

Re: Fill the cell

Re: Fill the cell

Fill the cell

Re: Fill the cell

Re: Fill the cell

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!