BookmarkSubscribeRSS Feed
soham_sas
Quartz | Level 8

Hi

 i have a employee level transactions data which shows type of transaction and the date of the transaction that the employee did

i have one more data which has the employee job information( the table is shown below ) , i have to calculate the job code tenure of the employee 

The logic to calculate the job code tenure is like this : lets say the employee did transactions on 13th and 16th july-2020

  • then against the 16th july i have to mark CA as the job code and the tenure will be (16july-15-jul) ,and this is because on 16th july the date less then 16th july is 15th july , and the job code was CA.
  • for 13th july i have to mark LA as the job code and the tenure will be (13th july- 13 Jan) , and this is because on 13th july the date less then 13th july is 13 jun , and the job code was LA which started from 13-jan-19

Please note that job code CA is repeated after the LA , we shouldn't consider these while calculating the tenure for 15th july.

 

i have also attached the below table in Excel format.

 

EMP_IDJob_cdcre_dt
1CA15-Jul-20
1LA13-Jun-20
1LA12-Jun-20
1LA02-May-20
1LA13-Jan-19
1CA12-Dec-19
1CA11-Oct-19
1DA01-Jan-18
1DA01-Jun-18
1DA15-Nov-18
3 REPLIES 3
mkeintz
PROC Star

Let's see if I understand:

 

One table (call it T) has one record per employee/transaction.  It has employee id (ID), transaction date (TDATE) and various other transaction related variables.   The other table (call it J, for which you provide a sample) is a table of some sort of employee job code  change.  It has one record per change for each employee.  It has variables ID,  jobcode-change-date (CRE_DT in your example), and job code (JOB_CD).

 

For some reason your table J appears to be sorted by ID  and descending (!) CRE_DT.

 

Let's assume, however that table T is sorted by ID and ascending TDATE.

 

Now, if J is not very big, I would first sort it by ID and ascending CRE_DT, first, as here:

proc sort data=J  out=Jprime;
  by id cre_dt;
run;

data want;
  set Jprime (keep=id cre_dt rename=(cre_dt=tdate) in=inJ) T (in=inT);
  by id tdate ;
/* if first.id then call missing (of _all_); */ if inJ then set Jprime ; if inT; run;

The data WANT step works this way:

  1. The first SET statement interleave data from Jprime and T, sorted by ID and tdate (temporarily renaming CRE_DT to TDATE to satisfy the "by id tdate" statement.  Importantly the variable JOB_CD from Jprime is NOT retrieved here.

  2. Note that Jprime precedes T in the set statement, because you will want to carry forward the CR_JOB value (from dataset Jprime) to the subsequent transaction (from T).   This could be tricky if the dates of the two datasets were identical, but putting Jprime before T in the set statement guarantees that the Jprime record is processed prior to the T record.

  3. If the record in hand comes dataset Jprime, do a conditional set (IF inJ then set Jprime), to reread the J record, this time with CRE_DT not renamed and, crucially, this time also with the JOB_CD variable.  Those JOB_CD and CRE_DT values will persist across subsequent T observations until another J record is encountered, and triggers this conditional set statement again.

  4. Finally, do a subsetting if, keeping only instances matching a transaction record.

This program assumes that there will always be a J record preceding the earliest T record for each ID.  If not, then de-comment the call missing statement.

 

 

Now, what if your J dataset is too big to sort, and you'd rather process it in its current pathologically sorted order?  Then you could first read in all the J records, and save the job codes in a cre_dt-indexed array.  Then when the T records are read start with cre_dt=tdate and go backwards until a non-blank job_cd is retrieved from the array:

%let mindate=01jan2015;
%let maxdate=31dec2019;

data want; set J (in=inJ) T (in=inT); by id; array jcodes {%sysevalf("&mindate"d):%sysevalf("&maxdate"d)} $2 _temporary_ ; if first.id then call missing(of jcodes{*}); if inJ then jcodes{cre_dt}=job_cd; if inT; do cre_dt=tdate to "&mindate"d by -1 until(job_cd^=' '); job_cd=jcodes{cre_dt}; end; run;
  1. Set the macrovars mindate and maxdate to value you know will span the entire range of dates in your datasets.
  2. The array JCODES has be set to be a array of $2 variables, since I assume that is the length of JOB_CD.  It is a _temporary_ array, meaning it won't appear on the output dataset, and its values are retained accross subsequent observations.  The array index lower bound is the underlying value for 01jan2015 and the upper bound is 31dec2019. 
  3. the BY statement is now only "by id", not "by id tdate", since you want to read ALL of the J records prior to reading the first T record.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
soham_sas
Quartz | Level 8

@mkeintz  Thanks for your Reply.

The current logic calculates the date correctly for the transaction of 16th july (i.e the cre_dt is 15th July)

but for the transaction of 13th july the cre_dt should be 13th jan-19 because the job code LA started on 13 jan-19

current logic calculates the cre_dt for the transaction of 13 july as 13 jun-2020

 

Thanks again for your help..

mkeintz
PROC Star

I'd be happy to address this.  Please provide SAS data steps with relevant parts of the two tables, so that I can actually test the code. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 679 views
  • 2 likes
  • 2 in conversation