Hi, I'm wondering if anyone has a suggestion as to how to create variables reflecting the number of entries, exits, and their combination (exits + entries = total turnover) within groups over time. I'm afraid my expertise is too limited.
Below is a snapshot of the data (lefthand columns; variables: Member, Team, Year) and what I'm seeking to capture summarized on the right:
Member | Team | Year | Three Variables to Create: entries, exit, and turnover | |
Adam | A | 2000 | First year/baseline - Group A: entries, exits and turnover (= exits + entries) all set to 0 | |
Andrea | A | 2000 | ||
Ann | A | 2000 | ||
Bob | A | 2000 | ||
Jeff | A | 2000 | ||
Jim | A | 2000 | ||
Ralph | B | 2000 | First year/baseline - Group B: entries, exits and turnover (= exits + entries) all set to 0 | |
Sam | B | 2000 | ||
Sandy | B | 2000 | ||
Shelly | B | 2000 | ||
Adam | A | 2001 | Group A x Year 2001: 0 entries; 0 exits; 0 turnover (= exits + entries) | |
Andrea | A | 2001 | ||
Ann | A | 2001 | ||
Bob | A | 2001 | ||
Jeff | A | 2001 | ||
Jim | A | 2001 | ||
Ralph | B | 2001 | Group B x Year 2001: 0 entries; 0 exits; 0 turnover (= exits + entries) | |
Sam | B | 2001 | ||
Sandy | B | 2001 | ||
Shelly | B | 2001 | ||
Adam | A | 2002 | note: Jeff exited | Group A x Year 2002: 0 entries; 1 exit; 1 turnover (= exits + entries) |
Andrea | A | 2002 | ||
Ann | A | 2002 | ||
Bob | A | 2002 | ||
Jim | A | 2002 | ||
Jessica | B | 2002 | note: Jessica entered | Group B x Year 2002: 1 entry; 0 exits; 1 turnover (= exits + entries) |
Ralph | B | 2002 | ||
Sam | B | 2002 | ||
Sandy | B | 2002 | ||
Shelly | B | 2002 |
Any thoughts or suggestions are greatly appreciated! Cheers!
MGray
I think you can take advantage of proc summary/means ability to find the minimum and maximum years for each team*member. Assuming a member does not leave and return this this works - no sorting required:
proc summary data=have min max nway ;
class team member;
var year;
output out=range min=minyear max=maxyear;
run;
data want (keep=team year entries exits turnover);
array ent {2000:2002} _temporary_ (3*0);
array ext {2000:2002} _temporary_ (3*0);
set range;
by team ;
if minyear>2000 then ent{minyear}+1; /*exclude baseline year from entry counts*/
if maxyear<2002 then ext{maxyear+1}+1; /*exclude last data year from exit counts*/
if last.team then do year=2000 to 2002;
entries=ent{year}; ent{year}=0;
exits=ext{year}; ext{year}=0;
turnover=entries+exits;
output;
end;
run;
Note the data step accumulates totals for minyear (for entries) and maxyear (for exists) in two arrays (each indexed by year). When the last record for each team is encountered, that array is retrieved and output.
Here is one way you could do your calculations:
proc sort data=have; by team member year; run; proc transpose data=have out=want (drop=_name_); by team member; var var; id year; run; data want (drop=i); set want; array years _2000-_2002; by team; entry=0; exit=0; do i=2 to 3; if years(i) and missing(years(i-1)) then entry=entry+1; else if missing(years(i)) and years(i-1) then exit=exit+1; end; turnover=entry+exit; run; proc means data=want sum; var entry exit turnover; by team; run;
Art, CEO, AnalystFinder.com
I see sample data, please post what the equivalent output should be.
I think you can take advantage of proc summary/means ability to find the minimum and maximum years for each team*member. Assuming a member does not leave and return this this works - no sorting required:
proc summary data=have min max nway ;
class team member;
var year;
output out=range min=minyear max=maxyear;
run;
data want (keep=team year entries exits turnover);
array ent {2000:2002} _temporary_ (3*0);
array ext {2000:2002} _temporary_ (3*0);
set range;
by team ;
if minyear>2000 then ent{minyear}+1; /*exclude baseline year from entry counts*/
if maxyear<2002 then ext{maxyear+1}+1; /*exclude last data year from exit counts*/
if last.team then do year=2000 to 2002;
entries=ent{year}; ent{year}=0;
exits=ext{year}; ext{year}=0;
turnover=entries+exits;
output;
end;
run;
Note the data step accumulates totals for minyear (for entries) and maxyear (for exists) in two arrays (each indexed by year). When the last record for each team is encountered, that array is retrieved and output.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.