Hi All,
I'm infinitely grateful to folks who responded to an earlier query regarding creation of turnover variables (entries, exits, and total turnover [entries + exits]). A terrific solution was provided for a balanced dataset, or at least one in which the number of years was known and could be specified.
Alas, a second dataset contains unbalanced data. Start and end years differ; also, duration varies across teams. So, for example, four years of data might be available for one team; only three years for another. What's more, teams exhibit different start and end years. A sample of the data are provided below. To the right of the three variable columns (Team, Year, Member) are some notes reflecting the events that need capturing (again, exits, entries, and their total).
Sample data:
Team | Year | Member | ||
A | 2000 | Charlene | ||
A | 2000 | David | ||
A | 2000 | Jane | ||
A | 2000 | Paul | ||
A | 2001 | Charlene | ||
A | 2001 | David | ||
A | 2001 | Jane | ||
A | 2001 | Paul | ||
A | 2002 | Ann | Charlene exited; Ann entered | |
A | 2002 | David | ||
A | 2002 | Jane | ||
A | 2002 | Paul | ||
A | 2003 | Ann | ||
A | 2003 | David | ||
A | 2003 | Jane | ||
A | 2003 | Jim | Jim entered | |
A | 2003 | Paul | ||
B | 2000 | Betty | ||
B | 2000 | Joe | ||
B | 2000 | Suresh | ||
B | 2000 | Tom | ||
B | 2001 | Brandy | Betty exited; Brandy entered | |
B | 2001 | Joe | ||
B | 2001 | Suresh | Tom exited | |
C | 2002 | Gwen | ||
C | 2002 | Jamal | ||
C | 2002 | Peter | ||
C | 2002 | Susan | ||
C | 2003 | Edward | Edward entered | |
C | 2003 | Gwen | ||
C | 2003 | Jamal | ||
C | 2003 | Peter | ||
C | 2003 | Susan | ||
C | 2004 | Edward | ||
C | 2004 | Gwen | ||
C | 2004 | Jamal | ||
C | 2004 | Peter | Susan exited |
Whereas the input data are observations at the individual level of analysis, the output is aggregated at the team level and should have a Team x Year structure including three variables: Entry (# additions to the team in the most recent time period), Exit (# exits from the team in the most recent timer period), and Turnover (= Entry + Exit in the most recent time period). In other words ...
Team | Year | Entry | Exit | Turnover |
A | 2000 | . | . | . |
A | 2001 | 0 | 0 | 0 |
A | 2002 | 1 | 1 | 2 |
A | 2003 | 1 | 0 | 1 |
B | 2000 | . | . | . |
B | 2001 | 1 | 2 | 3 |
C | 2002 | . | . | . |
C | 2003 | 1 | 0 | 1 |
C | 2004 | 0 | 1 | 1 |
The first year (time period) for each team scores missing values since it cannot have incurred turnover.
Please know any thoughts or suggestions are immensely appreciated! Thanks!
Just as proc summary can provide the minyear and maxyear for each team*member, it can simultaneously provide minyear and maxyear for each team. This means you can compare each member's time span to their team timespan, instead of a global pre-known time range.
data have;
input Team $1. Year Member :$8.;
datalines;
A 2000 Charlene
A 2000 David
A 2000 Jane
A 2000 Paul
A 2001 Charlene
A 2001 David
A 2001 Jane
A 2001 Paul
A 2002 Ann Charlene exited/ Ann entered
A 2002 David
A 2002 Jane
A 2002 Paul
A 2003 Ann
A 2003 David
A 2003 Jane
A 2003 Jim Jim entered
A 2003 Paul
B 2000 Betty
B 2000 Joe
B 2000 Suresh
B 2000 Tom
B 2001 Brandy Betty exited, Brandy entered
B 2001 Joe
B 2001 Suresh Tom exited
C 2002 Gwen
C 2002 Jamal
C 2002 Peter
C 2002 Susan
C 2003 Edward Edward entered
C 2003 Gwen
C 2003 Jamal
C 2003 Peter
C 2003 Susan
C 2004 Edward
C 2004 Gwen
C 2004 Jamal
C 2004 Peter Susan exited
run;
proc summary data=have min max ;
class team member;
var year;
output out=range min=minyear max=maxyear;
run;
data want (keep=team year entries exits turnover);
array ent {1991:2010} _temporary_ (20*0);
array ext {1991:2010} _temporary_ (20*0);
merge range (where=(_type_=2) rename=(minyear=team_minyear maxyear=team_maxyear))
range (where=(_type_=3));
by team ;
if minyear>team_minyear then ent{minyear}+1; /*exclude baseline year from entry counts*/
if maxyear<team_maxyear then ext{maxyear+1}+1; /*exclude last data year from exit counts*/
if last.team then do year=team_minyear to team_maxyear;
entries=ent{year}; ent{year}=0;
exits=ext{year}; ext{year}=0;
turnover=entries+exits;
if year=team_minyear then call missing(entries,exits,turnover);
output;
end;
run;
Notes:
The nice thing is that the core of the program is changed very little.
Just as proc summary can provide the minyear and maxyear for each team*member, it can simultaneously provide minyear and maxyear for each team. This means you can compare each member's time span to their team timespan, instead of a global pre-known time range.
data have;
input Team $1. Year Member :$8.;
datalines;
A 2000 Charlene
A 2000 David
A 2000 Jane
A 2000 Paul
A 2001 Charlene
A 2001 David
A 2001 Jane
A 2001 Paul
A 2002 Ann Charlene exited/ Ann entered
A 2002 David
A 2002 Jane
A 2002 Paul
A 2003 Ann
A 2003 David
A 2003 Jane
A 2003 Jim Jim entered
A 2003 Paul
B 2000 Betty
B 2000 Joe
B 2000 Suresh
B 2000 Tom
B 2001 Brandy Betty exited, Brandy entered
B 2001 Joe
B 2001 Suresh Tom exited
C 2002 Gwen
C 2002 Jamal
C 2002 Peter
C 2002 Susan
C 2003 Edward Edward entered
C 2003 Gwen
C 2003 Jamal
C 2003 Peter
C 2003 Susan
C 2004 Edward
C 2004 Gwen
C 2004 Jamal
C 2004 Peter Susan exited
run;
proc summary data=have min max ;
class team member;
var year;
output out=range min=minyear max=maxyear;
run;
data want (keep=team year entries exits turnover);
array ent {1991:2010} _temporary_ (20*0);
array ext {1991:2010} _temporary_ (20*0);
merge range (where=(_type_=2) rename=(minyear=team_minyear maxyear=team_maxyear))
range (where=(_type_=3));
by team ;
if minyear>team_minyear then ent{minyear}+1; /*exclude baseline year from entry counts*/
if maxyear<team_maxyear then ext{maxyear+1}+1; /*exclude last data year from exit counts*/
if last.team then do year=team_minyear to team_maxyear;
entries=ent{year}; ent{year}=0;
exits=ext{year}; ext{year}=0;
turnover=entries+exits;
if year=team_minyear then call missing(entries,exits,turnover);
output;
end;
run;
Notes:
The nice thing is that the core of the program is changed very little.
mkeintz,
You are brilliant! I again extend thanks ... not only for the solution, but also the insights. As someone who is self-taught in SAS, and very much an amateur, I've learned a great deal from your (and others') posts. With deep appreciation, cheers,
MGray000
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.