BookmarkSubscribeRSS Feed
ItsMeAG
Fluorite | Level 6

Hi,

I have a dataset with employee id, start date, role code, role group and status. 

empidstart dtrole coderole groupstatus
10101-Jan-20231AD1A
10101-May-20231AD1A
10101-Aug-20232AD1A
10101-Sep-20301 A
10101-Nov-20301 A
10201-Apr-21231AD1A
10201-Jun-21231AD1D
10201-Aug-22231AD1A

 

 

From this table, I want to define another date variable 'effective'

- if employee has same role group or role code then effective should be minimum(very first) start  date.

-if employee status='D' meaning not Active then effective date should be that of previous record.

-if there is any entry after status=D meaning(rejoined) then effective should be start dt of new role.

 

Expected final table is 

empidstart dtrole coderole groupstatusEffective
10101-Jan-20231AD1A01-Jan-20
10101-May-20231AD1A01-Jan-20
10101-Aug-20232AD1A01-Jan-20
10101-Sep-20301 A01-Sep-20
10101-Nov-20301 A01-Sep-20
10201-Apr-21231AD1A01-Apr-21
10201-Jun-21231AD1D01-Apr-21
10201-Aug-22231AD1A01-Aug-22

 

I tried lag function, first. and last. , but nothing worked.

Your help is truly appreciated!

 

Thanks,

2 REPLIES 2
Ksharp
Super User

Assuming I understood what you mean.

 

data have;
infile cards expandtabs ;
input empid	startdt :date11.	rolecode 	rolegroup $ status $;
format startdt date11.;
cards;
101	01-Jan-20	231	AD1	A
101	01-May-20	231	AD1	A
101	01-Aug-20	232	AD1	A
101	01-Sep-20	301	 .	A
101	01-Nov-20	301	 .	A
102	01-Apr-21	231	AD1	A
102	01-Jun-21	231	AD1	D
102	01-Aug-22	231	AD1	A
;
data have;
 set have;
 by empid rolegroup status notsorted;
 group+first.status;
run;
proc sql;
create table want as
select *,
case 
when(status='A') then min(startdt)
when(status='D') then (select max(startdt) from have where empid=a.empid and startdt<a.startdt)
else . 
end as  effective format date11.
 from have as a
  group by group
   order by empid,startdt;
quit;
mkeintz
PROC Star

You didn't show your code using the LAG function, but I bet you used it in the THEN clause of an IF ... THEN statement, which would mean that the lagged values would come from the last observation when the IF condition was true, not necessarily from the immediately previous observation (that's because the LAG function is a FIFO queue updater, not a "lookback").

 

Here is code that uses FIRST.  and LAG functions suitable for your purpose, which allows you to have a single DATA step solution:

data have;
infile cards expandtabs ;
input empid	startdt :date11.	rolecode 	rolegroup $ status $;
format startdt date11.;
cards;
101	01-Jan-20	231	AD1	A
101	01-May-20	231	AD1	A
101	01-Aug-20	232	AD1	A
101	01-Sep-20	301	 .	A
101	01-Nov-20	301	 .	A
102	01-Apr-21	231	AD1	A
102	01-Jun-21	231	AD1	D
102	01-Aug-22	231	AD1	A
;

data want;
  set have;
  by empid ;

  retain effective;
  format effective date9.;

  if first.empid then effective=startdt;
  if first.empid=0 and rolecode^=lag(rolecode) and rolegroup^=lag(rolegroup) then effective=startdt;
  if status='A' and lag(status)='D' then effective=startdt;
run;

 

Notice the second IF statement does not begin as ELSE if.  Otherwise the values produced by the LAG functions would not be synchronized with the immediately preceding observations.

 

And if you want to have more extensive use of FIRST. and less use of LAG:

 

data want;
  set have;
  by empid rolecode notsorted;

  set have (keep=empid rolegroup);
  by empid rolegroup notsorted;

  retain effective;
  format effective date9.;

  if first.empid then effective=startdt;
  else if first.rolecode and first.rolegroup then effective=startdt;

  if status='A' and lag(status)='D' then effective=startdt;
run;

Note that there are two pairs of SET + BY statements.  The second SET rereads the observation read by the first.  I do it that way because the program needs independently constructed FIRST.ROLECODE and FIRST.ROLEGROUP dummies.  If both of those variables had been in a single BY statement, then they would not be independent.  Detection of a FIRST. condition on any given BY variable would force a FIRST. condition on all subsequent vars in the BY statement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 275 views
  • 0 likes
  • 3 in conversation