Hello Community,
I am wondering about how to process those records where start dates are different but end dates are the same, such that the code take the record with the longest days in the date range. In the example below, the problematic employee_id is 97531.
/*Example of initial dataset.*/ data example; input employee_id $ startdate :DATE8. enddate :DATE8.; format startdate enddate DATE8.; datalines; 12345 21JUN20 24JUN20 12345 28JUN20 18JUL20 12345 29JUL20 14AUG20 12345 09NOV20 20NOV20 67890 24APR20 07JUN20 67890 24JUN20 08JUL20 67890 21JUL20 26AUG20 97531 30APR17 05MAY17 97531 01MAY17 05MAY17 ; run; proc print data=example; run;
This person is contributing two observations to the dataset: Obs 8 and Obs 9. Since enddate is 05May17 for both, I want to keep Obs 8 and drop Obs 9. This is because I want to keep the observation that has the longest days, in this case it is Obs 8, in order words the one where startdate is 30Apr17.
Below is the code for how I envision the ending dataset to look like:
/*Example of eventual dataset.*/ data want; input employee_id $ startdate :DATE8. enddate :DATE8. final_startdate :DATE8. final_enddate :DATE8. drop $; format startdate enddate final_startdate final_enddate DATE8.; datalines; 12345 21JUN20 24JUN20 21JUN20 24JUN20 no 12345 28JUN20 18JUL20 28JUN20 18JUL20 no 12345 29JUL20 14AUG20 29JUL20 14AUG20 no 12345 09NOV20 20NOV20 09NOV20 20NOV20 no 67890 24APR20 07JUN20 24APR20 07JUN20 no 67890 24JUN20 08JUL20 24JUN20 08JUL20 no 67890 21JUL20 26AUG20 21JUL20 26AUG20 no 97531 30APR17 05MAY17 . . yes 97531 01MAY17 05MAY17 01MAY17 05MAY17 no ; run; proc print data=want; run;
I would appreciate any suggestions in how to approach this.
Thank you!
Is this what you're looking for?
data example;
input employee_id $ startdate :DATE8. enddate :DATE8.;
format startdate enddate DATE8.;
datalines;
12345 21JUN20 24JUN20
12345 28JUN20 18JUL20
12345 29JUL20 14AUG20
12345 09NOV20 20NOV20
67890 24APR20 07JUN20
67890 24JUN20 08JUL20
67890 21JUL20 26AUG20
97531 30APR17 05MAY17
97531 01MAY17 05MAY17
;
run;
proc sort data=example
out=example_sorted;
by employee_id enddate descending startdate;
run;
data want;
set example_sorted;
by employee_id enddate;
format final_startdate final_enddate date8. drop $8.;
if first.enddate
then do;
final_startdate = startdate;
final_enddate = enddate;
drop = "no";
end;
else
do;
final_startdate = .;
final_enddate = .;
drop = "yes";
end;
run;
proc print data=want;
run;
Is this what you're looking for?
data example;
input employee_id $ startdate :DATE8. enddate :DATE8.;
format startdate enddate DATE8.;
datalines;
12345 21JUN20 24JUN20
12345 28JUN20 18JUL20
12345 29JUL20 14AUG20
12345 09NOV20 20NOV20
67890 24APR20 07JUN20
67890 24JUN20 08JUL20
67890 21JUL20 26AUG20
97531 30APR17 05MAY17
97531 01MAY17 05MAY17
;
run;
proc sort data=example
out=example_sorted;
by employee_id enddate descending startdate;
run;
data want;
set example_sorted;
by employee_id enddate;
format final_startdate final_enddate date8. drop $8.;
if first.enddate
then do;
final_startdate = startdate;
final_enddate = enddate;
drop = "no";
end;
else
do;
final_startdate = .;
final_enddate = .;
drop = "yes";
end;
run;
proc print data=want;
run;
data example;
input employee_id $ startdate :DATE8. enddate :DATE8.;
format startdate enddate DATE8.;
datalines;
12345 21JUN20 24JUN20
12345 28JUN20 18JUL20
12345 29JUL20 14AUG20
12345 09NOV20 20NOV20
67890 24APR20 07JUN20
67890 24JUN20 08JUL20
67890 21JUL20 26AUG20
97531 30APR17 05MAY17
97531 01MAY17 05MAY17
;
run;
data want;
array enddt[999] _temporary_ ;
array stdt[999] _temporary_ ;
do i=1 by 1 until(last.employee_id);
set example;
by employee_id;
enddt[i]=enddate;
stdt[i]=startdate;
end;
do j=1 by 1 until(last.employee_id);
set example;
by employee_id;
if enddt[j+1]=enddate then do;
if (startdate - enddate) < (stdt[j+1] -enddate) then drop='yes';
end;
else drop='no';
output;
call missing(drop);
end;
drop i j;
run;
employee_ Obs id startdate enddate drop 1 12345 21JUN20 24JUN20 no 2 12345 28JUN20 18JUL20 no 3 12345 29JUL20 14AUG20 no 4 12345 09NOV20 20NOV20 no 5 67890 24APR20 07JUN20 no 6 67890 24JUN20 08JUL20 no 7 67890 21JUL20 26AUG20 no 8 97531 30APR17 05MAY17 yes 9 97531 01MAY17 05MAY17 no
Thank you Lapis.
Actually, I made a mistake in my initial question where I should have said that I want to drop the observation for employee_id 97531 where startdate=01MAY17 as opposed to when it =30APR17. I want to keep the longest observation. So, really the desired output looks like the following:
The code you suggested works with a minor modification to do this:
proc sort data=example out=example_sorted; by employee_id enddate descending startdate; run; data want; set example_sorted; by employee_id enddate; format final_startdate final_enddate date8. drop $8.; if last.enddate then do; final_startdate = startdate; final_enddate = enddate; drop = "no"; end; else do; final_startdate = .; final_enddate = .; drop = "yes"; end; run;
I simply replaced "first" with "last".
Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.