BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MGray000
Calcite | Level 5

Hi, I'm wondering if anyone has a suggestion as to how to create variables reflecting the number of entries, exits, and their combination (exits + entries = total turnover) within groups over time. I'm afraid my expertise is too limited.

 

Below is a snapshot of the data (lefthand columns; variables: Member, Team, Year) and what I'm seeking to capture summarized on the right:

 

MemberTeamYear Three Variables to Create: entries, exit, and turnover
AdamA2000 First year/baseline - Group A: entries, exits and turnover (= exits + entries) all set to 0
AndreaA2000  
AnnA2000  
BobA2000  
JeffA2000  
JimA2000  
RalphB2000 First year/baseline - Group B: entries, exits and turnover (= exits + entries) all set to 0
Sam B2000  
SandyB2000  
ShellyB2000  
AdamA2001 Group A x Year 2001: 0 entries; 0 exits; 0 turnover (= exits + entries)
AndreaA2001  
AnnA2001  
BobA2001  
JeffA2001  
JimA2001  
RalphB2001 Group B x Year 2001: 0 entries; 0 exits; 0 turnover (= exits + entries)
Sam B2001  
SandyB2001  
ShellyB2001  
AdamA2002 note: Jeff exitedGroup A x Year 2002: 0 entries; 1 exit; 1 turnover (= exits + entries)
AndreaA2002  
AnnA2002  
BobA2002  
JimA2002  
JessicaB2002note: Jessica enteredGroup B x Year 2002: 1 entry; 0 exits; 1 turnover (= exits + entries)
RalphB2002  
Sam B2002  
SandyB2002  
ShellyB2002  

 

Any thoughts or suggestions are greatly appreciated!  Cheers!

 

MGray

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I think you can take advantage of proc summary/means ability to find the minimum and maximum years for each team*member.  Assuming a member does not leave and return this this works - no sorting required:

 

proc summary data=have min max nway ;
  class team member;
  var year;
  output out=range min=minyear max=maxyear;
run;

data want (keep=team year entries exits turnover);
  array ent {2000:2002} _temporary_ (3*0);
  array ext {2000:2002} _temporary_ (3*0);
  set range;
  by team ;
  if minyear>2000 then ent{minyear}+1;    /*exclude baseline year from entry counts*/
  if maxyear<2002 then ext{maxyear+1}+1;  /*exclude last data year from exit counts*/
  
  if last.team then do year=2000 to 2002;
    entries=ent{year};  ent{year}=0;
    exits=ext{year};    ext{year}=0;
    turnover=entries+exits;
    output;
  end;
run;

 

Note the data step accumulates totals for minyear (for entries) and maxyear (for exists) in two arrays (each indexed by year).  When the last record for each team is encountered, that array is retrieved and output.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
art297
Opal | Level 21

Here is one way you could do your calculations:

 

proc sort data=have;
  by team member year;
run;

proc transpose data=have out=want (drop=_name_);
  by team member;
  var var;
  id year;
run;

data want (drop=i);
  set want;
  array years _2000-_2002;
  by team;
    entry=0;
    exit=0;
  do i=2 to 3;
    if years(i) and missing(years(i-1)) then entry=entry+1;
    else if missing(years(i)) and years(i-1) then exit=exit+1;
  end;
  turnover=entry+exit;
run;

proc means data=want sum;
  var entry exit turnover;
  by team;
run;

Art, CEO, AnalystFinder.com

 

MGray000
Calcite | Level 5
Art, thanks so much!  
The program produced an output dataset containing one observation per team, with the following variables:
Team EntryExitTurnover 
I realize now I erred earlier.  My data are longitudinal.  Hence, I should have specified the need for yearly entry, exit and turnover stats (i.e., team X year output).  My bad!  But as it happens, another response produced just that output. So, problem solved!
Again, I can't thank you enough for lending a hand.  Cheers!

Reeza
Super User

I see sample data, please post what the equivalent output should be. 

MGray000
Calcite | Level 5
Hi Reeza,
Thanks for your follow-up!  Clearly, I erred in not mentioning the need for a longitudinal solution (year X team output).  But as it happens, MKeintz posted a solution that produces just that.  So, I'm all set.  I so appreciate your time.  Cheers!
mkeintz
PROC Star

I think you can take advantage of proc summary/means ability to find the minimum and maximum years for each team*member.  Assuming a member does not leave and return this this works - no sorting required:

 

proc summary data=have min max nway ;
  class team member;
  var year;
  output out=range min=minyear max=maxyear;
run;

data want (keep=team year entries exits turnover);
  array ent {2000:2002} _temporary_ (3*0);
  array ext {2000:2002} _temporary_ (3*0);
  set range;
  by team ;
  if minyear>2000 then ent{minyear}+1;    /*exclude baseline year from entry counts*/
  if maxyear<2002 then ext{maxyear+1}+1;  /*exclude last data year from exit counts*/
  
  if last.team then do year=2000 to 2002;
    entries=ent{year};  ent{year}=0;
    exits=ext{year};    ext{year}=0;
    turnover=entries+exits;
    output;
  end;
run;

 

Note the data step accumulates totals for minyear (for entries) and maxyear (for exists) in two arrays (each indexed by year).  When the last record for each team is encountered, that array is retrieved and output.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
MGray000
Calcite | Level 5
Success!!  Thank you so much.  I'm indebted.
Cheers!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 3225 views
  • 0 likes
  • 4 in conversation