Hi SAS masters,
I am trying to clean my dataset for -3 and +3 year window.
My current data looks like..
Data test;
input ID MANAGER YEAR POST_TURNOVER;
datalines;
1001 a 1990 0
1001 a 1991 0
1001 a 1992 0
1001 a 1993 0
1001 a 1994 0
1001 a 1995 1
1001 a 1996 1
1001 a 1997 1
1001 a 1998 1
1001 a 1999 1
1001 a 1995 0
1001 a 1996 0
1001 a 1997 0
1001 a 1998 0
1001 a 1999 0
1001 a 2000 0
1001 b 2000 1
1001 b 2001 1
1001 b 2002 1
1001 b 2003 1
1002 c 1989 0
1002 c 1990 0
1002 c 1991 0
1002 c 1992 0
1002 c 1993 1
1002 c 1994 1
1002 c 1995 1
1002 c 1996 1
1002 c 1997 1
;
run;
My question here is how I can specify my dataset with -3 and +3 window (excluding the turnover year) for turnovers by ID and Manager.
My data should look like..
ID MANAGER YEAR POST_TURNOVER Turnover
1001 a 1992 0 0
1001 a 1993 0 0
1001 a 1994 0 0
1001 a 1995 1 1
1001 a 1996 1 0
1001 a 1997 1 0
1001 a 1998 1 0
1001 a 1998 0 0
1001 a 1999 0 0
1001 a 2000 0 0
1001 b 2000 1 1
1001 b 2001 1 0
1001 b 2002 1 0
1001 b 2003 1 0
1002 c 1990 0 0
1002 c 1991 0 0
1002 c 1992 0 0
1002 c 1993 1 1
1002 c 1994 1 0
1002 c 1995 1 0
1002 c 1996 1 0
Thank you!
First of all, Thank you for the very interesting question
Second of all, My apologies for the lack of attention on my part
Third of all, Sorry for the delay after early vow as I am not feeling well
fourth of all, I would wait for better answers from SAS champs like Art T, K sharp, Tom, mkeintz, chris nz and others
last of all, I will await for your feedback should you require any further changes and in the meantime will have my lunch 🙂
Data test;
input ID MANAGER $ YEAR POST_TURNOVER;
datalines;
1001 a 1990 0
1001 a 1991 0
1001 a 1992 0
1001 a 1993 0
1001 a 1994 0
1001 a 1995 1
1001 a 1996 1
1001 a 1997 1
1001 a 1998 1
1001 a 1999 1
1001 a 1995 0
1001 a 1996 0
1001 a 1997 0
1001 a 1998 0
1001 a 1999 0
1001 a 2000 0
1001 b 2000 1
1001 b 2001 1
1001 b 2002 1
1001 b 2003 1
1002 c 1989 0
1002 c 1990 0
1002 c 1991 0
1002 c 1992 0
1002 c 1993 1
1002 c 1994 1
1002 c 1995 1
1002 c 1996 1
1002 c 1997 1
;
run;
data temp;
set test;
by id POST_TURNOVER notsorted;
if first.POST_TURNOVER and POST_TURNOVER then TURNOVER=1;
else TURNOVER=0;
run;
data want;
drop _:;
if _n_=1 then do;
if 0 then set temp;
dcl hash H () ;
h.definekey ("id","year") ;
h.definedata ("MANAGER","YEAR","POST_TURNOVER","TURNOVER") ;
h.definedone () ;
end;
set temp end=last;
by id;
retain _year;
if first.id then call missing(_year);
if _count=0 then h.replace();
if turnover=1 then do;
_year=year;
_turnover=turnover;
do _i=_year-3 to _year;
if h.find(key:id, key:_i)=0 then output;
end;
end;
if not turnover and _year then do;
if _count<=3 then do;
if _year+1<=year<=_year+3 then _count+1;
if _count<=3 then output;
end;
end;
if _count=3 then do; _count=0;call missing(_year);end;
run;
will this help?
Data test;
input ID MANAGER YEAR POST_TURNOVER;
datalines;
1001 a 1990 0
1001 a 1991 0
1001 a 1992 0
1001 a 1993 0
1001 a 1994 0
1001 a 1995 1
1001 a 1996 1
1001 a 1997 1
1001 a 1998 1
1001 a 1999 1
1001 a 1995 0
1001 a 1996 0
1001 a 1997 0
1001 a 1998 0
1001 a 1999 0
1001 a 2000 0
1001 b 2000 1
1001 b 2001 1
1001 b 2002 1
1001 b 2003 1
1002 c 1989 0
1002 c 1990 0
1002 c 1991 0
1002 c 1992 0
1002 c 1993 1
1002 c 1994 1
1002 c 1995 1
1002 c 1996 1
1002 c 1997 1
;
run;
data want;
set test;
by id POST_TURNOVER notsorted;
if first.POST_TURNOVER and POST_TURNOVER then TURNOVER=1;
else TURNOVER=0;
run;
How can I specify -3 and +3 event window (excluding the turnover year)?
@hkim3677 Does n;t my result match your wanted output? what should your output look like?
My data should look like..
ID Manager Year Post_Turnover Turnover
1001 a 1992 0 0
1001 a 1993 0 0
1001 a 1994 0 0
1001 a 1995 1 1
1001 a 1996 1 0
1001 a 1997 1 0
1001 a 1998 1 0
1001 a 1998 0 0
1001 a 1999 0 0
1001 a 2000 0 0
1001 b 2000 1 1
1001 b 2001 1 0
1001 b 2002 1 0
1001 b 2003 1 0
1002 c 1990 0 0
1002 c 1991 0 0
1002 c 1992 0 0
1002 c 1993 1 1
1002 c 1994 1 0
1002 c 1995 1 0
1002 c 1996 1 0
As you can see.. I need to restrict my original sample to observations having 3 pre- and post-turnover observations for each manager (excluding the turnover year) or it could be -2 and +2 as well. In other words, the -3 and +3 window allows (-1,+1) (-2,+1) (-3, +1) (-3.+2), etc.
Please help!
Oh My apologies, i haven't got over the hung over from last night. so sorry
Haha.. its okay.
Can you help me the code?
@hkim3677 wrote:
Hi SAS masters,
I am trying to clean my dataset for -3 and +3 year window.
-3 to +3 from what? We need a reference rule. If you wanted all the same years from a single base year, say 1997
it would be as simple as:
if abs(year-1997) le 3;
But obviously you have different years in mind for each manager. What is the rule for selecting the base value for each manager?
I have to say that with your ID= 1001 and manager = a showing years 1992 to 2000 doesn't jib with any of my simple interpretations of -3 to +3.
The reference point year is the turnover year.
So, I need to retain only -3 and +3 year observations based on the turnover point. Each managers and firm has different turnover year, so I do not have general turnover year. The turnover year varies across firm and manager.
First of all, Thank you for the very interesting question
Second of all, My apologies for the lack of attention on my part
Third of all, Sorry for the delay after early vow as I am not feeling well
fourth of all, I would wait for better answers from SAS champs like Art T, K sharp, Tom, mkeintz, chris nz and others
last of all, I will await for your feedback should you require any further changes and in the meantime will have my lunch 🙂
Data test;
input ID MANAGER $ YEAR POST_TURNOVER;
datalines;
1001 a 1990 0
1001 a 1991 0
1001 a 1992 0
1001 a 1993 0
1001 a 1994 0
1001 a 1995 1
1001 a 1996 1
1001 a 1997 1
1001 a 1998 1
1001 a 1999 1
1001 a 1995 0
1001 a 1996 0
1001 a 1997 0
1001 a 1998 0
1001 a 1999 0
1001 a 2000 0
1001 b 2000 1
1001 b 2001 1
1001 b 2002 1
1001 b 2003 1
1002 c 1989 0
1002 c 1990 0
1002 c 1991 0
1002 c 1992 0
1002 c 1993 1
1002 c 1994 1
1002 c 1995 1
1002 c 1996 1
1002 c 1997 1
;
run;
data temp;
set test;
by id POST_TURNOVER notsorted;
if first.POST_TURNOVER and POST_TURNOVER then TURNOVER=1;
else TURNOVER=0;
run;
data want;
drop _:;
if _n_=1 then do;
if 0 then set temp;
dcl hash H () ;
h.definekey ("id","year") ;
h.definedata ("MANAGER","YEAR","POST_TURNOVER","TURNOVER") ;
h.definedone () ;
end;
set temp end=last;
by id;
retain _year;
if first.id then call missing(_year);
if _count=0 then h.replace();
if turnover=1 then do;
_year=year;
_turnover=turnover;
do _i=_year-3 to _year;
if h.find(key:id, key:_i)=0 then output;
end;
end;
if not turnover and _year then do;
if _count<=3 then do;
if _year+1<=year<=_year+3 then _count+1;
if _count<=3 then output;
end;
end;
if _count=3 then do; _count=0;call missing(_year);end;
run;
If I understand your question correctly, this is not a complicated task.
Do a SET statement with TEST as the object 2 times
The SET statement will pass through all the turnover years for a given id first, allowing generation of minyear (minyear=first turnover year minus 3) and maxyear (equal last turnover year +3) . The data step will then pass through ALL years for the same id, where comparison against minyear and maxyear can be tested.
Data test;
input ID MANAGER $ YEAR POST_TURNOVER;
datalines;
1001 a 1990 0
1001 a 1991 0
1001 a 1992 0
1001 a 1993 0
1001 a 1994 0
1001 a 1995 1
1001 a 1996 1
1001 a 1997 1
1001 a 1998 1
1001 a 1999 1
1001 a 1995 0
1001 a 1996 0
1001 a 1997 0
1001 a 1998 0
1001 a 1999 0
1001 a 2000 0
1001 b 2000 1
1001 b 2001 1
1001 b 2002 1
1001 b 2003 1
1002 c 1989 0
1002 c 1990 0
1002 c 1991 0
1002 c 1992 0
1002 c 1993 1
1002 c 1994 1
1002 c 1995 1
1002 c 1996 1
1002 c 1997 1
run;
data want;
set test (where=(post_turnover=1) in=in1)
test (in=in2);
by id manager;
retain min_year max_year;
if first.manager then call missing (min_year,max_year);
if first.manager and in1=1 then min_year=year-3;
else if in1 then max_year=year+3;
if in2=1 and (min_year<=year<=max_year);
run;
Notes
The program allows for multi-year event spans (a sequence of post_turnover=1 records), but does not accommodate multiple events separated by non-event years.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.