BookmarkSubscribeRSS Feed
SASPreK
Fluorite | Level 6
Hi!
I have the following dataset

Data have;
input id $  dt:yymmdd10. year  test $  result $;
format dt  date11.;
datalines;
ABCDE   20100429 2010   T1   N 
ABCDE   20100712 2010   T1   N
ABCDE   20101025 2010   T1   Y
ABCDE   20110509 2011   T1   Y
ABCDE   20110909 2011   T1   Y
ABCUK   20150301 2015   T2   Y
ABCUK   20150801 2015   T1   N
ABCUK   20150914 2015   T1   N
ABCGH   20190613 2019   T1   N
ABCGH   20190815 2019   T2   N
ABCPK   20130409 2013   T1   N
ABCPK   20131021 2013   T1   N
;


I want to select the record with the latest date in a year for all records with same id and year. I also want if a record that has a result ‘Y’ but not the latest date, it should be prioritised and selected over other records with result ‘N’, same ID and date in later months. In summary, my objective is to have only one record (from same IDs) in a year, if the result variable is ‘Y’ or ‘N’ for all, I want the record with the latest date. Any record (with same ID) with result ‘Y’ in an earlier date should be selected over a record with result ‘N’ in a later date. The output should have only one record for each ID in each year. See below:

Want
id dt Year test result
ABCDE 20101025 2010 T1 Y
ABCDE 20110909 2011 T1 Y
ABCUK 20150301 2015 T2 Y
ABCGH 20190815 2019 T2
ABCPK 20131021 2013 T1

I have tried
Price sql;
Create table want as
Select*
From have
Group by id
Having dt=max(dt);
Quit;

But it does not give the output based on my specific requirements.

Thank you for your help!
4 REPLIES 4
Amir
PROC Star

Hi,

 

Are you looking for something like the following? If not please clarify what should be different.

 

/* sort in required order */
proc sort data = have
          out  = have_sorted;

  by id year result dt;
run;


/* obtain last record per id & year */
data want;
  set have_sorted;
  
  by id year result dt;
  
  if last.year;
run;

 

 

Thanks & kind regards,

Amir.

PaigeMiller
Diamond | Level 26

@SASPreK wrote:

I have tried
Price sql;
Create table want as
Select*
From have
Group by id
Having dt=max(dt);
Quit;

But it does not give the output based on my specific requirements.

PRICE SQL ??? That won't work.

 

Also, you didn't tell SQL to do this for each year. How about this?

 

proc sql;
    create table want as
    select *
    from have
    group by id,year
    having dt=max(dt);
quit;

 

 

--
Paige Miller
Ksharp
Super User

Data have;
input id $  dt:yymmdd10. year  test $  result $;
format dt  date11.;
datalines;
ABCDE   20100429 2010   T1   N 
ABCDE   20100712 2010   T1   N
ABCDE   20101025 2010   T1   Y
ABCDE   20110509 2011   T1   Y
ABCDE   20110909 2011   T1   Y
ABCUK   20150301 2015   T2   Y
ABCUK   20150801 2015   T1   N
ABCUK   20150914 2015   T1   N
ABCGH   20190613 2019   T1   N
ABCGH   20190815 2019   T2   N
ABCPK   20130409 2013   T1   N
ABCPK   20131021 2013   T1   N
;
proc sql;
create table part1 as
select * from have where result='Y' group by id,year having dt=max(dt);
create table part2 as
select * from have where catx('|',id,year) not in ( select catx('|',id,year)  from part1) 
 group by id,year having dt=max(dt);
create table want as
select * from part1 union select * from part2;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 912 views
  • 1 like
  • 5 in conversation