Hi,
I have a dataset with employee id, start date, role code, role group and status.
empid | start dt | role code | role group | status |
101 | 01-Jan-20 | 231 | AD1 | A |
101 | 01-May-20 | 231 | AD1 | A |
101 | 01-Aug-20 | 232 | AD1 | A |
101 | 01-Sep-20 | 301 | A | |
101 | 01-Nov-20 | 301 | A | |
102 | 01-Apr-21 | 231 | AD1 | A |
102 | 01-Jun-21 | 231 | AD1 | D |
102 | 01-Aug-22 | 231 | AD1 | A |
From this table, I want to define another date variable 'effective'
- if employee has same role group or role code then effective should be minimum(very first) start date.
-if employee status='D' meaning not Active then effective date should be that of previous record.
-if there is any entry after status=D meaning(rejoined) then effective should be start dt of new role.
Expected final table is
empid | start dt | role code | role group | status | Effective |
101 | 01-Jan-20 | 231 | AD1 | A | 01-Jan-20 |
101 | 01-May-20 | 231 | AD1 | A | 01-Jan-20 |
101 | 01-Aug-20 | 232 | AD1 | A | 01-Jan-20 |
101 | 01-Sep-20 | 301 | A | 01-Sep-20 | |
101 | 01-Nov-20 | 301 | A | 01-Sep-20 | |
102 | 01-Apr-21 | 231 | AD1 | A | 01-Apr-21 |
102 | 01-Jun-21 | 231 | AD1 | D | 01-Apr-21 |
102 | 01-Aug-22 | 231 | AD1 | A | 01-Aug-22 |
I tried lag function, first. and last. , but nothing worked.
Your help is truly appreciated!
Thanks,
Assuming I understood what you mean.
data have;
infile cards expandtabs ;
input empid startdt :date11. rolecode rolegroup $ status $;
format startdt date11.;
cards;
101 01-Jan-20 231 AD1 A
101 01-May-20 231 AD1 A
101 01-Aug-20 232 AD1 A
101 01-Sep-20 301 . A
101 01-Nov-20 301 . A
102 01-Apr-21 231 AD1 A
102 01-Jun-21 231 AD1 D
102 01-Aug-22 231 AD1 A
;
data have;
set have;
by empid rolegroup status notsorted;
group+first.status;
run;
proc sql;
create table want as
select *,
case
when(status='A') then min(startdt)
when(status='D') then (select max(startdt) from have where empid=a.empid and startdt<a.startdt)
else .
end as effective format date11.
from have as a
group by group
order by empid,startdt;
quit;
You didn't show your code using the LAG function, but I bet you used it in the THEN clause of an IF ... THEN statement, which would mean that the lagged values would come from the last observation when the IF condition was true, not necessarily from the immediately previous observation (that's because the LAG function is a FIFO queue updater, not a "lookback").
Here is code that uses FIRST. and LAG functions suitable for your purpose, which allows you to have a single DATA step solution:
data have;
infile cards expandtabs ;
input empid startdt :date11. rolecode rolegroup $ status $;
format startdt date11.;
cards;
101 01-Jan-20 231 AD1 A
101 01-May-20 231 AD1 A
101 01-Aug-20 232 AD1 A
101 01-Sep-20 301 . A
101 01-Nov-20 301 . A
102 01-Apr-21 231 AD1 A
102 01-Jun-21 231 AD1 D
102 01-Aug-22 231 AD1 A
;
data want;
set have;
by empid ;
retain effective;
format effective date9.;
if first.empid then effective=startdt;
if first.empid=0 and rolecode^=lag(rolecode) and rolegroup^=lag(rolegroup) then effective=startdt;
if status='A' and lag(status)='D' then effective=startdt;
run;
Notice the second IF statement does not begin as ELSE if. Otherwise the values produced by the LAG functions would not be synchronized with the immediately preceding observations.
And if you want to have more extensive use of FIRST. and less use of LAG:
data want;
set have;
by empid rolecode notsorted;
set have (keep=empid rolegroup);
by empid rolegroup notsorted;
retain effective;
format effective date9.;
if first.empid then effective=startdt;
else if first.rolecode and first.rolegroup then effective=startdt;
if status='A' and lag(status)='D' then effective=startdt;
run;
Note that there are two pairs of SET + BY statements. The second SET rereads the observation read by the first. I do it that way because the program needs independently constructed FIRST.ROLECODE and FIRST.ROLEGROUP dummies. If both of those variables had been in a single BY statement, then they would not be independent. Detection of a FIRST. condition on any given BY variable would force a FIRST. condition on all subsequent vars in the BY statement.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.