Hi all,
I want to backfill the ExecID data if BECAMECEO<=fyear<=LEFTOFC.
Take year 1979 as an example, the EXECID should be the EXECID in 1994 because 1979 is within the executive employment period.
year in ExecuComp | Date became CEO | date left CEO | name | EXECID |
1979 | ||||
1980 | ||||
1981 | ||||
1982 | ||||
1983 | ||||
1984 | ||||
1985 | ||||
1986 | ||||
1987 | ||||
1988 | ||||
1989 | ||||
1990 | ||||
1991 | ||||
1992 | ||||
1993 | ||||
1994 | 01Jan1955 | 01OCT1996 | Ira A. Eichner | 9248 |
1995 | 01Jan1955 | 01OCT1996 | Ira A. Eichner | 9249 |
1996 | 09OCT1996 | 31MAY2018 | David P. Storch | 9249 |
How can I code this to backfill the data?
If you want a reliable, tested code sample, please provide your sample data in the form of a working data step. You don't even have a column named fyear, yet your refer to it in your question. Probably we can guess, but is it a good idea to force your advisors to guess? Help us help you.
This is the data I have:
Fyear | Date became CEO | date left CEO | name | EXECID |
1979 | . | |||
1980 | . | |||
1981 | . | |||
1982 | . | |||
1983 | . | |||
1984 | . | |||
1985 | . | |||
1986 | . | |||
1987 | . | |||
1988 | . | |||
1989 | . | |||
1990 | . | |||
1991 | . | |||
1992 | . | |||
1993 | . | |||
1994 | 01Jan1955 | 01OCT1996 | Ira A. Eichner | 9248 |
1995 | 01Jan1955 | 01OCT1996 | Ira A. Eichner | 9249 |
1996 | 09OCT1996 | 31MAY2018 | David P. Storch | 9249 |
And this is the data I want:
Fyear | Date became CEO | date left CEO | name | EXECID |
1979 | 9248 | |||
1980 | 9248 | |||
1981 | 9248 | |||
1982 | 9248 | |||
1983 | 9248 | |||
1984 | 9248 | |||
1985 | 9248 | |||
1986 | 9248 | |||
1987 | 9248 | |||
1988 | 9248 | |||
1989 | 9248 | |||
1990 | 9248 | |||
1991 | 9248 | |||
1992 | 9248 | |||
1993 | 9248 | |||
1994 | 01Jan1955 | 01OCT1996 | Ira A. Eichner | 9248 |
1995 | 01Jan1955 | 01OCT1996 | Ira A. Eichner | 9249 |
1996 | 09OCT1996 | 31MAY2018 | David P. Storch | 9249 |
Try this
/* First sort descending to give the latest values at top */
proc sort data=have out=have_srtd;
by descending 'year in ExecuComp'n ;
run;
/* Keep track the first non-missing Previous value and use it to replace the missing value */
data want(drop=prev_val);
set work.have_srtd;
retain prev_val;
if (lag(EXECID) ge 0) then prev_val=lag(EXECID);
put prev_val= EXECID=;
if (missing(EXECID)=1) then EXECID=prev_val;
put prev_val= EXECID=;
run;
/* Restore original sort order */
proc sort data=work.want;
by 'year in ExecuComp'n ;
run;
Hi AhmedA1,
Thanks for your reply.
I did not quite understand your code, but I did not see the code like 'BECAMECEO<=year in execucomp<=LEFTOFC' appears, and this is a really important condition that I want.
Editte note: you still haven't provided a working data step, so the code below is untested.
This looks like Execucomp data from Compustat. If so, then fiscal year (variable FYEAR) need not include January 1st of the same nominal year. For instance, fiscal year May 1993-April 1994 is identified as fiscal year 1993 (more months in calendar 1993 than calendar 1994), but does not include Jan 1, 1993. If a person became CEO of such a company in February 1994 you would presumably miss the fact that they overlap with FYEAR=1993. Isn't there another variable indicating which month (with possible values 1 through 12) is the last of a fiscal year Fiscal years ending in January - through May are identified with the year in which they start. Fiscal years ending in June through December are identified with the year containing the end month.
And more generally, you haven't stated what you want done when two people are CEO's in the same FYEAR. You show one record per FYEAR. So if there are two (or more!) CEO's in one fiscal year, what do you want in the result? Multiple records? One record with the CEO having has the most months? But if so, then how will you determine which CEO has the most months? Especially given that fiscal years need not align with calendar years.
Also, presumably you have this need to retrieve future values for multiple companies (multiple gvkeys?), so you would need your sample data to have gvkey, and to be sorted by gvkey fyear.
data want (drop=y);
set have (in=firstpass) have (in=secondpass);
by gvkey ;
array id_hist {1990:2020} _temporary_;
if first.gvkey then call missing(of id_hist{*});
if firstpass=1 and nmiss(begindate,enddate)=0 then do y=year(begindate) to year(enddate);
id_hist(y)=execid;
end;
if secondpass;
if missing(execid) then execid=id_hist{fyear};
run;
For any observation with a blank execid, if that fyear value overlapped if more than one date range for CEO's, this program would return the latter execid.
Also note this program does not sort in descending chronological order, then carry "forward" values, then re-sort in ascending order. Instead it reads each GVKEY twice - once to establish an ID_HIST, the second time to retrieve from that history.
Finally, this program covers 1990 through 2020. If you need more years, change the lower and/or upper bounds of the id_hist array.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.