Re: job code Tenure

soham_sas · Posted 08-01-2020 01:38 PM

Hi

i have a employee level transactions data which shows type of transaction and the date of the transaction that the employee did

i have one more data which has the employee job information( the table is shown below ) , i have to calculate the job code tenure of the employee

The logic to calculate the job code tenure is like this : lets say the employee did transactions on 13th and 16th july-2020

then against the 16th july i have to mark CA as the job code and the tenure will be (16july-15-jul) ,and this is because on 16th july the date less then 16th july is 15th july , and the job code was CA.
for 13th july i have to mark LA as the job code and the tenure will be (13th july- 13 Jan) , and this is because on 13th july the date less then 13th july is 13 jun , and the job code was LA which started from 13-jan-19

Please note that job code CA is repeated after the LA , we shouldn't consider these while calculating the tenure for 15th july.

i have also attached the below table in Excel format.

EMP_ID	Job_cd	cre_dt
1	CA	15-Jul-20
1	LA	13-Jun-20
1	LA	12-Jun-20
1	LA	02-May-20
1	LA	13-Jan-19
1	CA	12-Dec-19
1	CA	11-Oct-19
1	DA	01-Jan-18
1	DA	01-Jun-18
1	DA	15-Nov-18

mkeintz · Posted 08-01-2020 06:06 PM

Let's see if I understand:

One table (call it T) has one record per employee/transaction. It has employee id (ID), transaction date (TDATE) and various other transaction related variables. The other table (call it J, for which you provide a sample) is a table of some sort of employee job code change. It has one record per change for each employee. It has variables ID, jobcode-change-date (CRE_DT in your example), and job code (JOB_CD).

For some reason your table J appears to be sorted by ID and descending (!) CRE_DT.

Let's assume, however that table T is sorted by ID and ascending TDATE.

Now, if J is not very big, I would first sort it by ID and ascending CRE_DT, first, as here:

proc sort data=J  out=Jprime;
  by id cre_dt;
run;

data want;
  set Jprime (keep=id cre_dt rename=(cre_dt=tdate) in=inJ) T (in=inT);
  by id tdate ;
  /* if first.id then call missing (of _all_); */
  if inJ then set Jprime ;
  if inT;
run;

The data WANT step works this way:

The first SET statement interleave data from Jprime and T, sorted by ID and tdate (temporarily renaming CRE_DT to TDATE to satisfy the "by id tdate" statement. Importantly the variable JOB_CD from Jprime is NOT retrieved here.
Note that Jprime precedes T in the set statement, because you will want to carry forward the CR_JOB value (from dataset Jprime) to the subsequent transaction (from T). This could be tricky if the dates of the two datasets were identical, but putting Jprime before T in the set statement guarantees that the Jprime record is processed prior to the T record.
If the record in hand comes dataset Jprime, do a conditional set (IF inJ then set Jprime), to reread the J record, this time with CRE_DT not renamed and, crucially, this time also with the JOB_CD variable. Those JOB_CD and CRE_DT values will persist across subsequent T observations until another J record is encountered, and triggers this conditional set statement again.
Finally, do a subsetting if, keeping only instances matching a transaction record.

This program assumes that there will always be a J record preceding the earliest T record for each ID. If not, then de-comment the call missing statement.

Now, what if your J dataset is too big to sort, and you'd rather process it in its current pathologically sorted order? Then you could first read in all the J records, and save the job codes in a cre_dt-indexed array. Then when the T records are read start with cre_dt=tdate and go backwards until a non-blank job_cd is retrieved from the array:

%let mindate=01jan2015;
%let maxdate=31dec2019;

data want;
  set J (in=inJ)  T (in=inT);
  by id;
  array jcodes {%sysevalf("&mindate"d):%sysevalf("&maxdate"d)} $2 _temporary_ ;
  if first.id then call missing(of jcodes{*});
  if inJ then jcodes{cre_dt}=job_cd;

  if inT;
  do cre_dt=tdate to "&mindate"d by -1 until(job_cd^=' ');
     job_cd=jcodes{cre_dt};
  end;
run;

Set the macrovars mindate and maxdate to value you know will span the entire range of dates in your datasets.
The array JCODES has be set to be a array of $2 variables, since I assume that is the length of JOB_CD. It is a _temporary_ array, meaning it won't appear on the output dataset, and its values are retained accross subsequent observations. The array index lower bound is the underlying value for 01jan2015 and the upper bound is 31dec2019.
the BY statement is now only "by id", not "by id tdate", since you want to read ALL of the J records prior to reading the first T record.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

soham_sas · Posted 08-02-2020 07:48 AM

@mkeintz Thanks for your Reply.

The current logic calculates the date correctly for the transaction of 16th july (i.e the cre_dt is 15th July)

but for the transaction of 13th july the cre_dt should be 13th jan-19 because the job code LA started on 13 jan-19

current logic calculates the cre_dt for the transaction of 13 july as 13 jun-2020

Thanks again for your help..

mkeintz · Posted 08-02-2020 04:18 PM

I'd be happy to address this. Please provide SAS data steps with relevant parts of the two tables, so that I can actually test the code.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

job code Tenure