Hi, Excelsius! So at first, you need to sort your data in order of the variables that are listed in the BY statement otherwise it won't work out (similar to MERGE BY). In my code I changed the order of rows in the example dataset so if you remove PROC SORT the second DATA STEP will generate an error. Now to how FIRST and LAST work. I think it would be easier to demonstrate with following example: data outt;
set T0;
by id date1;
first_ID = first.ID;
last_ID = last.ID;
first_date = first.date1;
last_date = last.date1;
run; first.ID and last.ID mark the beginning and the end of each group for ID in whole dataset first.DATE1 and last.DATE1 mark the beginning and the end of each group for DATE1 inside each group for ID So to find the start or the end of any (ID, DATE1) group inside the dataset you should look only at FIRST and LAST for DATE1. Now to finding the max value of DATE2. So how do you find max value in a sequence? You can create a new variable, assign to it the minimal possible value (in our case in can be some very old date like '01JAN1900'd however missing value also works), read every value and if it's greater than max then make it new max. The problem is that SAS sets all variables to missing when reading new observation from input dataset. You can avoid this by using REATIN statement: it tells SAS not to set this variable to missing when reading new observation. Now since you want to find max value not in the whole dataset but in every group, manually set your variable to missing when reading first observation in a group. And output when you encounter the last observation in a group. data T0;
input ID $ SEL $ DATE1 :mmddyy10. DATE2 :mmddyy10.;
format DATE1 mmddyy10. DATE2 mmddyy10.;
datalines;
100 a . .
102 d 09/17/2020 09/17/2020
103 e 09/17/2020 09/18/2020
103 e 09/18/2020 09/19/2020
102 c 09/14/2020 09/14/2020
103 a 09/18/2020 09/20/2020
103 a 09/18/2020 09/21/2020
101 b 09/13/2020 09/13/2020
;
run;
proc sort data=T0;
by id date1;
run;
data out;
set T0;
by id date1;
/* create new variable for max value */
length max_date 8;
format max_date mmddyy10.;
/* sas sets all variables to missing when reading new observation
to avoid this use retain */
retain max_date;
/* set max_date to null only when you are reading first observation in a group */
if first.date1 then call missing (of max_date);
/* looking for max value */
if date2 > max_date then max_date = date2;
/* for a last observation in a group assign max value to date2 and output */
if last.date1 then do;
date2 = max_date;
output;
end;
/* keep only the fields you need */
keep id date1 date2;
run; Hope this will help you 🙂 P.S. Here is a good paper about BY statement
... View more