Re: Selecting records with latest date in each year

SASPreK · Posted 05-11-2025 03:13 AM

Hi!
I have the following dataset

Data have;
input id $ dt:yymmdd10. year test $ result $;
format dt date11.;
datalines;
ABCDE   20100429 2010   T1   N
ABCDE   20100712 2010   T1   N
ABCDE   20101025 2010   T1   Y
ABCDE   20110509 2011   T1   Y
ABCDE   20110909 2011   T1   Y
ABCUK   20150301 2015   T2   Y
ABCUK   20150801 2015   T1   N
ABCUK   20150914 2015   T1   N
ABCGH   20190613 2019   T1   N
ABCGH   20190815 2019   T2   N
ABCPK   20130409 2013   T1   N
ABCPK   20131021 2013   T1   N
;

I want to select the record with the latest date in a year for all records with same id and year. I also want if a record that has a result ‘Y’ but not the latest date, it should be prioritised and selected over other records with result ‘N’, same ID and date in later months. In summary, my objective is to have only one record (from same IDs) in a year, if the result variable is ‘Y’ or ‘N’ for all, I want the record with the latest date. Any record (with same ID) with result ‘Y’ in an earlier date should be selected over a record with result ‘N’ in a later date. The output should have only one record for each ID in each year. See below:

Want
id dt Year test result
ABCDE 20101025 2010 T1 Y
ABCDE 20110909 2011 T1 Y
ABCUK 20150301 2015 T2 Y
ABCGH 20190815 2019 T2
ABCPK 20131021 2013 T1

I have tried
Price sql;
Create table want as
Select*
From have
Group by id
Having dt=max(dt);
Quit;

But it does not give the output based on my specific requirements.

Thank you for your help!

Amir · Posted 05-11-2025 05:33 AM

Hi,

Are you looking for something like the following? If not please clarify what should be different.

/* sort in required order */
proc sort data = have
          out  = have_sorted;

  by id year result dt;
run;


/* obtain last record per id & year */
data want;
  set have_sorted;
  
  by id year result dt;
  
  if last.year;
run;

Thanks & kind regards,

Amir.

PaigeMiller · Posted 05-11-2025 06:08 AM

@SASPreK wrote:

I have tried
Price sql;
Create table want as
Select*
From have
Group by id
Having dt=max(dt);
Quit;

But it does not give the output based on my specific requirements.

PRICE SQL ??? That won't work.

Also, you didn't tell SQL to do this for each year. How about this?

proc sql;
    create table want as
    select *
    from have
    group by id,year
    having dt=max(dt);
quit;

--
Paige Miller

Ksharp · Posted 05-11-2025 10:12 AM


Data have;
input id $  dt:yymmdd10. year  test $  result $;
format dt  date11.;
datalines;
ABCDE   20100429 2010   T1   N 
ABCDE   20100712 2010   T1   N
ABCDE   20101025 2010   T1   Y
ABCDE   20110509 2011   T1   Y
ABCDE   20110909 2011   T1   Y
ABCUK   20150301 2015   T2   Y
ABCUK   20150801 2015   T1   N
ABCUK   20150914 2015   T1   N
ABCGH   20190613 2019   T1   N
ABCGH   20190815 2019   T2   N
ABCPK   20130409 2013   T1   N
ABCPK   20131021 2013   T1   N
;
proc sql;
create table part1 as
select * from have where result='Y' group by id,year having dt=max(dt);
create table part2 as
select * from have where catx('|',id,year) not in ( select catx('|',id,year)  from part1) 
 group by id,year having dt=max(dt);
create table want as
select * from part1 union select * from part2;
quit;

Kurt_Bremser · Posted 05-11-2025 01:08 PM

PROC SORT / DATA step method:

proc sort data=have;
by id year result dt;
run;

data want;
set have;
by id year;
if last.year;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

dxiao2017 · Posted 08-12-2025 07:52 AM

Hi @SASPreK ,your requirement should be done in several steps using different techniques, it would not be able to achieve in a single step through a single technique (e.g., I tried many times with retain and if.first /if.last techniques but failed and almost give up this question and then switched to other techniques). The steps basically include the follows: (a) produce the final results column according to your requirements, this can be done with proc transpose and array, and (b) output the last id and year row together with the final results column, this can be done though if.last within a data step and then a sql join. The code and results are as follows.

data have;
input id $  dt:yymmdd10. year  test $  
      result $;
format dt  date11.;
datalines;
ABCDE   20100429 2010   T1   N 
ABCDE   20100712 2010   T1   N
ABCDE   20101025 2010   T1   Y
ABCDE   20110509 2011   T1   Y
ABCDE   20110909 2011   T1   Y
ABCUK   20150301 2015   T2   Y
ABCUK   20150801 2015   T1   N
ABCUK   20150914 2015   T1   N
ABCGH   20190613 2019   T1   N
ABCGH   20190815 2019   T2   N
ABCPK   20130409 2013   T1   N
ABCPK   20131021 2013   T1   N
;
run;
proc print data=have;run;

/*produce final results column according to
requirements, using transpose and array*/
proc sort data=have out=have1;
   by id dt;
run;
proc print data=have1;run;
proc transpose data=have1 out=have2;
   by id;
   var result;
run;
proc print data=have2;run;
data have3;
   set have2;
   array rst[5] $ col1-col5;
   do i=1 to 5;
      if rst[i]='Y' then result='Y';
   end;
run;
proc print data=have3;run;

/*output the last id and year row together
with the final results, using if.last in
a data step and sql join*/
data have4;
   set have1;
   by id year;
   if last.year;
   drop result;
run;
proc print data=have4;run;
proc sql;
select h4.*,
       h3.result      
   from have4 as h4 left join
        have3 as h3
   on h4.id=h3.id;
quit;

Selecting records with latest date in each year