BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hkim3677
Calcite | Level 5

Hi SAS masters,

 

I am trying to clean my dataset for -3 and +3 year window.

 

My current data looks like..

 

Data test;

input ID MANAGER YEAR POST_TURNOVER;

datalines;

1001 a 1990 0

1001 a 1991 0

1001 a 1992 0

1001 a 1993 0

1001 a 1994 0

1001 a 1995 1

1001 a 1996 1

1001 a 1997 1

1001 a 1998 1

1001 a 1999 1

1001 a 1995 0

1001 a 1996 0

1001 a 1997 0

1001 a 1998 0

1001 a 1999 0

1001 a 2000 0

1001 b 2000 1

1001 b 2001 1

1001 b 2002 1

1001 b 2003 1

1002 c 1989 0

1002 c 1990 0

1002 c 1991 0

1002 c 1992 0

1002 c 1993 1

1002 c 1994 1

1002 c 1995 1

1002 c 1996 1

1002 c 1997 1 

;

run;

 

My question here is how I can specify my dataset with -3 and +3 window (excluding the turnover year) for turnovers by ID and Manager.

My data should look like..

 

ID MANAGER YEAR POST_TURNOVER Turnover

 

 

 

1001 a 1992 0 0

1001 a 1993 0 0

1001 a 1994 0 0

1001 a 1995 1 1

1001 a 1996 1 0

1001 a 1997 1 0

1001 a 1998 1 0

1001 a 1998 0 0

1001 a 1999 0 0

1001 a 2000 0 0

1001 b 2000 1 1

1001 b 2001 1 0

1001 b 2002 1 0

1001 b 2003 1 0

1002 c 1990 0 0

1002 c 1991 0 0

1002 c 1992 0 0

1002 c 1993 1 1

1002 c 1994 1 0

1002 c 1995 1 0

1002 c 1996 1 0

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

@hkim3677 

First of all, Thank you for the very interesting question

Second of all, My apologies for the lack of attention on my part

Third of all, Sorry for the delay after early vow as I am not feeling well

fourth of all, I would wait for better answers from SAS champs like Art T, K sharp, Tom, mkeintz, chris nz  and others 

last of all, I will await for your feedback should you require any further changes and in the meantime will have my lunch 🙂

 

Data test;

input ID MANAGER $ YEAR POST_TURNOVER;

datalines;
1001 a 1990 0
1001 a 1991 0
1001 a 1992 0
1001 a 1993 0
1001 a 1994 0
1001 a 1995 1
1001 a 1996 1
1001 a 1997 1
1001 a 1998 1
1001 a 1999 1
1001 a 1995 0
1001 a 1996 0
1001 a 1997 0
1001 a 1998 0
1001 a 1999 0
1001 a 2000 0
1001 b 2000 1
1001 b 2001 1
1001 b 2002 1
1001 b 2003 1
1002 c 1989 0
1002 c 1990 0
1002 c 1991 0
1002 c 1992 0
1002 c 1993 1
1002 c 1994 1
1002 c 1995 1
1002 c 1996 1
1002 c 1997 1 
;

run;


data temp;
set test;
by id POST_TURNOVER notsorted;
if first.POST_TURNOVER and POST_TURNOVER then TURNOVER=1;
else TURNOVER=0;
run;


data want;
drop _:;
if _n_=1 then do;
if 0 then set temp;
dcl hash H () ;
   h.definekey  ("id","year") ;
   h.definedata ("MANAGER","YEAR","POST_TURNOVER","TURNOVER") ;
   h.definedone () ;
end;
set temp end=last;
by id;
retain _year;
if first.id then call missing(_year);
if _count=0 then h.replace();
if turnover=1 then do; 
	_year=year;
	_turnover=turnover;
	do _i=_year-3 to _year;
	if h.find(key:id, key:_i)=0 then output;
	end;
end;
if not turnover and _year then do;
	if _count<=3 then do;
	if _year+1<=year<=_year+3 then _count+1;
	if _count<=3 then output;
	end;
end;
if _count=3 then do; _count=0;call missing(_year);end;
run;

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

will this help?

Data test;

input ID MANAGER YEAR POST_TURNOVER;

datalines;
1001 a 1990 0
1001 a 1991 0
1001 a 1992 0
1001 a 1993 0
1001 a 1994 0
1001 a 1995 1
1001 a 1996 1
1001 a 1997 1
1001 a 1998 1
1001 a 1999 1
1001 a 1995 0
1001 a 1996 0
1001 a 1997 0
1001 a 1998 0
1001 a 1999 0
1001 a 2000 0
1001 b 2000 1
1001 b 2001 1
1001 b 2002 1
1001 b 2003 1
1002 c 1989 0
1002 c 1990 0
1002 c 1991 0
1002 c 1992 0
1002 c 1993 1
1002 c 1994 1
1002 c 1995 1
1002 c 1996 1
1002 c 1997 1 
;

run;


data want;
set test;
by id POST_TURNOVER notsorted;
if first.POST_TURNOVER and POST_TURNOVER then TURNOVER=1;
else TURNOVER=0;
run;
hkim3677
Calcite | Level 5

How can I specify -3 and +3 event window (excluding the turnover year)?

novinosrin
Tourmaline | Level 20

@hkim3677 Does n;t my result match your wanted output? what should your output look like?

hkim3677
Calcite | Level 5

My data should look like..

ID Manager Year Post_Turnover Turnover

1001 a 1992 0 0

1001 a 1993 0 0

1001 a 1994 0 0

1001 a 1995 1 1

1001 a 1996 1 0

1001 a 1997 1 0

1001 a 1998 1 0

1001 a 1998 0 0

1001 a 1999 0 0

1001 a 2000 0 0

1001 b 2000 1 1

1001 b 2001 1 0

1001 b 2002 1 0

1001 b 2003 1 0

1002 c 1990 0 0

1002 c 1991 0 0

1002 c 1992 0 0

1002 c 1993 1 1

1002 c 1994 1 0

1002 c 1995 1 0

1002 c 1996 1 0

 

As you can see.. I need to restrict my original sample to observations having 3 pre- and post-turnover observations for each manager (excluding the turnover year) or it could be -2 and +2 as well. In other words, the -3 and +3 window allows (-1,+1) (-2,+1) (-3, +1) (-3.+2), etc.

 

Please help!

 

novinosrin
Tourmaline | Level 20

Oh My apologies, i haven't got over the hung over from last night. so sorry

hkim3677
Calcite | Level 5

Haha.. its okay.

 

Can you help me the code?

 

 

ballardw
Super User

@hkim3677 wrote:

Hi SAS masters,

 

I am trying to clean my dataset for -3 and +3 year window.


 

 

-3 to +3 from what? We need a reference rule. If you wanted all the same years from a single base year, say 1997

it would be as simple as:

 

if abs(year-1997) le 3;

 

But obviously you have different years in mind for each manager. What is the rule for selecting the base value for each manager?

I have to say that with your ID= 1001 and manager = a showing years 1992 to 2000 doesn't jib with any of my simple interpretations of -3 to +3.

hkim3677
Calcite | Level 5

The reference point year is the turnover year. 

So, I need to retain only -3 and +3 year observations based on the turnover point. Each managers and firm has different turnover year, so I do not have general turnover year. The turnover year varies across firm and manager.

 

 

novinosrin
Tourmaline | Level 20

@hkim3677 

First of all, Thank you for the very interesting question

Second of all, My apologies for the lack of attention on my part

Third of all, Sorry for the delay after early vow as I am not feeling well

fourth of all, I would wait for better answers from SAS champs like Art T, K sharp, Tom, mkeintz, chris nz  and others 

last of all, I will await for your feedback should you require any further changes and in the meantime will have my lunch 🙂

 

Data test;

input ID MANAGER $ YEAR POST_TURNOVER;

datalines;
1001 a 1990 0
1001 a 1991 0
1001 a 1992 0
1001 a 1993 0
1001 a 1994 0
1001 a 1995 1
1001 a 1996 1
1001 a 1997 1
1001 a 1998 1
1001 a 1999 1
1001 a 1995 0
1001 a 1996 0
1001 a 1997 0
1001 a 1998 0
1001 a 1999 0
1001 a 2000 0
1001 b 2000 1
1001 b 2001 1
1001 b 2002 1
1001 b 2003 1
1002 c 1989 0
1002 c 1990 0
1002 c 1991 0
1002 c 1992 0
1002 c 1993 1
1002 c 1994 1
1002 c 1995 1
1002 c 1996 1
1002 c 1997 1 
;

run;


data temp;
set test;
by id POST_TURNOVER notsorted;
if first.POST_TURNOVER and POST_TURNOVER then TURNOVER=1;
else TURNOVER=0;
run;


data want;
drop _:;
if _n_=1 then do;
if 0 then set temp;
dcl hash H () ;
   h.definekey  ("id","year") ;
   h.definedata ("MANAGER","YEAR","POST_TURNOVER","TURNOVER") ;
   h.definedone () ;
end;
set temp end=last;
by id;
retain _year;
if first.id then call missing(_year);
if _count=0 then h.replace();
if turnover=1 then do; 
	_year=year;
	_turnover=turnover;
	do _i=_year-3 to _year;
	if h.find(key:id, key:_i)=0 then output;
	end;
end;
if not turnover and _year then do;
	if _count<=3 then do;
	if _year+1<=year<=_year+3 then _count+1;
	if _count<=3 then output;
	end;
end;
if _count=3 then do; _count=0;call missing(_year);end;
run;
mkeintz
PROC Star

If I understand your question correctly, this is not a complicated task.

 

Do a SET statement with TEST as the object 2 times

  •  the first time just for turnover years
  • the second time for all year.

The SET statement will pass through all the turnover years for a given id first, allowing generation of minyear (minyear=first turnover year minus 3) and maxyear (equal last turnover year +3) .  The data step will then pass through ALL years for the same id, where comparison against minyear and maxyear can be tested.

 

Data test;
input ID MANAGER $ YEAR POST_TURNOVER;
datalines;
1001 a 1990 0
1001 a 1991 0
1001 a 1992 0
1001 a 1993 0
1001 a 1994 0
1001 a 1995 1
1001 a 1996 1
1001 a 1997 1
1001 a 1998 1
1001 a 1999 1
1001 a 1995 0
1001 a 1996 0
1001 a 1997 0
1001 a 1998 0
1001 a 1999 0
1001 a 2000 0
1001 b 2000 1
1001 b 2001 1
1001 b 2002 1
1001 b 2003 1
1002 c 1989 0
1002 c 1990 0
1002 c 1991 0
1002 c 1992 0
1002 c 1993 1
1002 c 1994 1
1002 c 1995 1
1002 c 1996 1
1002 c 1997 1 
run;

data want;
  set test (where=(post_turnover=1) in=in1)
      test (in=in2);
  by id manager;
  retain min_year max_year;
  if first.manager then call missing (min_year,max_year);
  if first.manager and in1=1 then min_year=year-3;
  else if in1 then max_year=year+3;

  if in2=1 and  (min_year<=year<=max_year);
run;

 

 

Notes

  1. The RETAIN statement tells SAS to not reset min_year and max_year to missing with every incoming record.
  2. Since all the turnover years are read first, testing on the associated IN1 dummy allows calculation of min_year and max_year.
  3. Then all the years are read (including a re-read of turnover years), allowing a subsetting IF of   (min_year<=year<=max_year).

The program allows for multi-year event spans (a sequence of post_turnover=1 records), but does not accommodate multiple events separated by non-event years.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 715 views
  • 1 like
  • 4 in conversation