SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to calculate the new occurrence of a data point that didn't exist in the prior period

Reply
Contributor
Posts: 59

How to calculate the new occurrence of a data point that didn't exist in the prior period

Hello all, 

 

I want to data that follows teams across many years of data. I want to calculate both (a) when new members join the team and (b) when members leave the team. For example, if in the year 2007, 2 members are new to the data set that did not exist in the year 2006, I want to count those 2 members. Additionally, if in the year 2007, 1 member is no longer a part of the data set that did exist in the year 2006, I want to count that as well. I have two character variables that I can use to calculate this but I'm unsure of how to write the code, so I'm hoping you can help me. 

 

My data is like so: 

 

TeamID        Year            TeamMemberName(char var)      TeamMemberID(char var)

1                  2006            TimJones                                      11

1                  2006            JoeSmith                                       22

1                  2006            BobBell                                         33

1                  2007            TimJones                                      11

1                  2007            JoeSmith                                       22

1                  2007            JayWatt                                         44

1                  2007            JohnJay                                         55

1                  2008            TimJones                                      11

1                  2008            JoeSmith                                       22

1                  2008            JayWatt                                         44

 

(except I have the data for many teams)

 

 

I want: 

 

TeamID   Year     NumberofNewMembers     NumberthatLeft

1             2007         2                                            1          

 

1             2008         0                                            1

 

 

 

 

Could someone please advise with the code?

 

Thanks

 

Super User
Posts: 23,237

Re: How to calculate the new occurrence of a data point that didn't exist in the prior period

  • Sort by Team Member and Year
  • Flag each persons first year using FIRST. type logic
  • Otherwise, person is flagged as returning. 
  • Flag each persons last year as LEFT (how to deal with last year of data?)
  • Summarize values using PROC MEANS or SQL. 

 

Here's some rough logic that should get you started. 

 


r4321 wrote:

Hello all, 

 

I want to data that follows teams across many years of data. I want to calculate both (a) when new members join the team and (b) when members leave the team. For example, if in the year 2007, 2 members are new to the data set that did not exist in the year 2006, I want to count those 2 members. Additionally, if in the year 2007, 1 member is no longer a part of the data set that did exist in the year 2006, I want to count that as well. I have two character variables that I can use to calculate this but I'm unsure of how to write the code, so I'm hoping you can help me. 

 

My data is like so: 

 

TeamID        Year            TeamMemberName(char var)      TeamMemberID(char var)

1                  2006            TimJones                                      11

1                  2006            JoeSmith                                       22

1                  2006            BobBell                                         33

1                  2007            TimJones                                      11

1                  2007            JoeSmith                                       22

1                  2007            JayWatt                                         44

1                  2007            JohnJay                                         55

1                  2008            TimJones                                      11

1                  2008            JoeSmith                                       22

1                  2008            JayWatt                                         44

 

(except I have the data for many teams)

 

 

I want: 

 

TeamID   Year     NumberofNewMembers     NumberthatLeft

1             2007         2                                            1          

 

1             2008         0                                            1

 

 

 

 

Could someone please advise with the code?

 

Thanks

 


 

Contributor
Posts: 59

Re: How to calculate the new occurrence of a data point that didn't exist in the prior period

I can follow the logic, but the main thing I'm having difficulty with specifically is how to write the code that compares the current year to the previous year to find (a) the number of new team members and (b) the number of team members that left.

Thanks
Super User
Posts: 23,237

Re: How to calculate the new occurrence of a data point that didn't exist in the prior period

[ Edited ]

You don't. You identify each persons first year with the FIRST logic. 

 

If you're not familiar with FIRST/LAST logic then you should look into it. Basically it will identify the first/last record for a group automatically, which is essentially what you're doing, if you think about it slightly differently. Because if it's not first or last year, then that means they were here last year and are here next year. But you do need to think about how to report the last year, because everyone in the last year will be identified as last. 

 

Quick demo of first/last

 

proc sort data=sashelp.class out=class;
by sex age;
run;

data youngest_by_sex;
set class;
by sex;

if first.sex then output;
run;

data oldest_by_sex;
set class;
by sex;

if last.sex then output;
run;

The documentation has an entire chapter that is quite informative.

http://documentation.sas.com/?docsetId=lrcon&docsetTarget=n138da4gme3zb7n1nifpfhqv7clq.htm&docsetVer...

 

If you post what you have so far I'm happy to help. 

 

 

PROC Star
Posts: 1,558

Re: How to calculate the new occurrence of a data point that didn't exist in the prior period

Ok @r4321 I'm afraid I did this in a  hurry as I am running for my midterm exam at my college. This should work, however feel free if there is any changes, I will review later in the evening. Have fun!

data have;
input TeamID        Year            TeamMemberName $     TeamMemberID $;
datalines;
1                  2006            TimJones                                      11
1                  2006            JoeSmith                                       22
1                  2006            BobBell                                         33
1                  2007            TimJones                                      11
1                  2007            JoeSmith                                       22
1                  2007            JayWatt                                         44
1                  2007            JohnJay                                         55
1                  2008            TimJones                                      11
1                  2008            JoeSmith                                       22
1                  2008            JayWatt                                         44
;

data want;
if _n_=1 then do;
if 0 then set have;
  dcl hash h();
  h.definekey('teamid','TeamMemberID');
  h.definedata('teamid','TeamMemberID');
  h.definedone();
  end;
  _n=0;
  do until(last.teamid);
  	do _n_=1 by 1 until(last.year);
  		set have end=last;
  		by teamid year TeamMemberID;
  		array t(100) $;
  		if first.year then do;call missing(of t(*)); _n+1;NumberofNewMembers=0;NumberthatLeft=0;end;
  		if _n=1 then h.add();
  		else if _n>1 then 
			do;
  				if h.check()=0 then do;t(_n_)=TeamMemberID; h.remove();end;
  				else if h.check() ne 0 then do; NumberofNewMembers+1;t(_n_)=TeamMemberID;end;
				if last.year then 
				do; 
					NumberthatLeft = h.num_items;h.clear();
					do _i=1 to _n_;
					h.add(key:teamid,key: t(_i),data:teamid,data:t(_i));
					end;
					output;
				end;
			end;
	end;
  end;
keep TeamID   Year     NumberofNewMembers     NumberthatLeft;
run;
Super User
Posts: 10,681

Re: How to calculate the new occurrence of a data point that didn't exist in the prior period

Very interesting Question.

 

data have;
input TeamID        Year            TeamMemberName $     TeamMemberID $;
datalines;
1                  2006            TimJones                                      11
1                  2006            JoeSmith                                       22
1                  2006            BobBell                                         33
1                  2007            TimJones                                      11
1                  2007            JoeSmith                                       22
1                  2007            JayWatt                                         44
1                  2007            JohnJay                                         55
1                  2008            TimJones                                      11
1                  2008            JoeSmith                                       22
1                  2008            JayWatt                                         44
;
data have;
 set have;
 by teamid year;
 if first.teamid then group=0;
 group+first.year;
run;
data want;
 if _n_=1 then do;
  if 0 then set have;
  declare hash h();
  h.definekey('teammemberid');
  h.definedone();
 end;
 

NumberofNewMembers=0;
do until(last.year);
 set have ;
 by teamid year;
 if  h.check() ne 0 then NumberofNewMembers+1;    
 rc=h.remove();
end;

if group ne 1 then do;
 NumberthatLeft=h.num_items;
 output;
end;
h.clear();

do until(last.year);
 set have ;
 by teamid year;
 h.ref();
end;
 keep TeamID   Year     NumberofNewMembers     NumberthatLeft;
run;
Ask a Question
Discussion stats
  • 5 replies
  • 150 views
  • 1 like
  • 4 in conversation