Hi, I'm wondering if anyone has a suggestion as to how to create variables reflecting the number of entries, exits, and their combination (exits + entries = total turnover) within groups over time. I'm afraid my expertise is too limited.
Below is a snapshot of the data (lefthand columns; variables: Member, Team, Year) and what I'm seeking to capture summarized on the right:
Member | Team | Year | Three Variables to Create: entries, exit, and turnover | |
Adam | A | 2000 | First year/baseline - Group A: entries, exits and turnover (= exits + entries) all set to 0 | |
Andrea | A | 2000 | ||
Ann | A | 2000 | ||
Bob | A | 2000 | ||
Jeff | A | 2000 | ||
Jim | A | 2000 | ||
Ralph | B | 2000 | First year/baseline - Group B: entries, exits and turnover (= exits + entries) all set to 0 | |
Sam | B | 2000 | ||
Sandy | B | 2000 | ||
Shelly | B | 2000 | ||
Adam | A | 2001 | Group A x Year 2001: 0 entries; 0 exits; 0 turnover (= exits + entries) | |
Andrea | A | 2001 | ||
Ann | A | 2001 | ||
Bob | A | 2001 | ||
Jeff | A | 2001 | ||
Jim | A | 2001 | ||
Ralph | B | 2001 | Group B x Year 2001: 0 entries; 0 exits; 0 turnover (= exits + entries) | |
Sam | B | 2001 | ||
Sandy | B | 2001 | ||
Shelly | B | 2001 | ||
Adam | A | 2002 | note: Jeff exited | Group A x Year 2002: 0 entries; 1 exit; 1 turnover (= exits + entries) |
Andrea | A | 2002 | ||
Ann | A | 2002 | ||
Bob | A | 2002 | ||
Jim | A | 2002 | ||
Jessica | B | 2002 | note: Jessica entered | Group B x Year 2002: 1 entry; 0 exits; 1 turnover (= exits + entries) |
Ralph | B | 2002 | ||
Sam | B | 2002 | ||
Sandy | B | 2002 | ||
Shelly | B | 2002 |
Any thoughts or suggestions are greatly appreciated! Cheers!
MGray
I think you can take advantage of proc summary/means ability to find the minimum and maximum years for each team*member. Assuming a member does not leave and return this this works - no sorting required:
proc summary data=have min max nway ;
class team member;
var year;
output out=range min=minyear max=maxyear;
run;
data want (keep=team year entries exits turnover);
array ent {2000:2002} _temporary_ (3*0);
array ext {2000:2002} _temporary_ (3*0);
set range;
by team ;
if minyear>2000 then ent{minyear}+1; /*exclude baseline year from entry counts*/
if maxyear<2002 then ext{maxyear+1}+1; /*exclude last data year from exit counts*/
if last.team then do year=2000 to 2002;
entries=ent{year}; ent{year}=0;
exits=ext{year}; ext{year}=0;
turnover=entries+exits;
output;
end;
run;
Note the data step accumulates totals for minyear (for entries) and maxyear (for exists) in two arrays (each indexed by year). When the last record for each team is encountered, that array is retrieved and output.
Here is one way you could do your calculations:
proc sort data=have; by team member year; run; proc transpose data=have out=want (drop=_name_); by team member; var var; id year; run; data want (drop=i); set want; array years _2000-_2002; by team; entry=0; exit=0; do i=2 to 3; if years(i) and missing(years(i-1)) then entry=entry+1; else if missing(years(i)) and years(i-1) then exit=exit+1; end; turnover=entry+exit; run; proc means data=want sum; var entry exit turnover; by team; run;
Art, CEO, AnalystFinder.com
I see sample data, please post what the equivalent output should be.
I think you can take advantage of proc summary/means ability to find the minimum and maximum years for each team*member. Assuming a member does not leave and return this this works - no sorting required:
proc summary data=have min max nway ;
class team member;
var year;
output out=range min=minyear max=maxyear;
run;
data want (keep=team year entries exits turnover);
array ent {2000:2002} _temporary_ (3*0);
array ext {2000:2002} _temporary_ (3*0);
set range;
by team ;
if minyear>2000 then ent{minyear}+1; /*exclude baseline year from entry counts*/
if maxyear<2002 then ext{maxyear+1}+1; /*exclude last data year from exit counts*/
if last.team then do year=2000 to 2002;
entries=ent{year}; ent{year}=0;
exits=ext{year}; ext{year}=0;
turnover=entries+exits;
output;
end;
run;
Note the data step accumulates totals for minyear (for entries) and maxyear (for exists) in two arrays (each indexed by year). When the last record for each team is encountered, that array is retrieved and output.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.