Hi
i have a employee level transactions data which shows type of transaction and the date of the transaction that the employee did
i have one more data which has the employee job information( the table is shown below ) , i have to calculate the job code tenure of the employee
The logic to calculate the job code tenure is like this : lets say the employee did transactions on 13th and 16th july-2020
Please note that job code CA is repeated after the LA , we shouldn't consider these while calculating the tenure for 15th july.
i have also attached the below table in Excel format.
EMP_ID | Job_cd | cre_dt |
1 | CA | 15-Jul-20 |
1 | LA | 13-Jun-20 |
1 | LA | 12-Jun-20 |
1 | LA | 02-May-20 |
1 | LA | 13-Jan-19 |
1 | CA | 12-Dec-19 |
1 | CA | 11-Oct-19 |
1 | DA | 01-Jan-18 |
1 | DA | 01-Jun-18 |
1 | DA | 15-Nov-18 |
Let's see if I understand:
One table (call it T) has one record per employee/transaction. It has employee id (ID), transaction date (TDATE) and various other transaction related variables. The other table (call it J, for which you provide a sample) is a table of some sort of employee job code change. It has one record per change for each employee. It has variables ID, jobcode-change-date (CRE_DT in your example), and job code (JOB_CD).
For some reason your table J appears to be sorted by ID and descending (!) CRE_DT.
Let's assume, however that table T is sorted by ID and ascending TDATE.
Now, if J is not very big, I would first sort it by ID and ascending CRE_DT, first, as here:
proc sort data=J out=Jprime;
by id cre_dt;
run;
data want;
set Jprime (keep=id cre_dt rename=(cre_dt=tdate) in=inJ) T (in=inT);
by id tdate ;
/* if first.id then call missing (of _all_); */
if inJ then set Jprime ;
if inT;
run;
The data WANT step works this way:
This program assumes that there will always be a J record preceding the earliest T record for each ID. If not, then de-comment the call missing statement.
Now, what if your J dataset is too big to sort, and you'd rather process it in its current pathologically sorted order? Then you could first read in all the J records, and save the job codes in a cre_dt-indexed array. Then when the T records are read start with cre_dt=tdate and go backwards until a non-blank job_cd is retrieved from the array:
%let mindate=01jan2015;
%let maxdate=31dec2019;
data want;
set J (in=inJ) T (in=inT);
by id;
array jcodes {%sysevalf("&mindate"d):%sysevalf("&maxdate"d)} $2 _temporary_ ;
if first.id then call missing(of jcodes{*});
if inJ then jcodes{cre_dt}=job_cd;
if inT;
do cre_dt=tdate to "&mindate"d by -1 until(job_cd^=' ');
job_cd=jcodes{cre_dt};
end;
run;
@mkeintz Thanks for your Reply.
The current logic calculates the date correctly for the transaction of 16th july (i.e the cre_dt is 15th July)
but for the transaction of 13th july the cre_dt should be 13th jan-19 because the job code LA started on 13 jan-19
current logic calculates the cre_dt for the transaction of 13 july as 13 jun-2020
Thanks again for your help..
I'd be happy to address this. Please provide SAS data steps with relevant parts of the two tables, so that I can actually test the code.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.