BookmarkSubscribeRSS Feed
SASPreK
Fluorite | Level 6
Hi!
I have the following dataset

Data have;
input id $  dt:yymmdd10. year  test $  result $;
format dt  date11.;
datalines;
ABCDE   20100429 2010   T1   N 
ABCDE   20100712 2010   T1   N
ABCDE   20101025 2010   T1   Y
ABCDE   20110509 2011   T1   Y
ABCDE   20110909 2011   T1   Y
ABCUK   20150301 2015   T2   Y
ABCUK   20150801 2015   T1   N
ABCUK   20150914 2015   T1   N
ABCGH   20190613 2019   T1   N
ABCGH   20190815 2019   T2   N
ABCPK   20130409 2013   T1   N
ABCPK   20131021 2013   T1   N
;


I want to select the record with the latest date in a year for all records with same id and year. I also want if a record that has a result ‘Y’ but not the latest date, it should be prioritised and selected over other records with result ‘N’, same ID and date in later months. In summary, my objective is to have only one record (from same IDs) in a year, if the result variable is ‘Y’ or ‘N’ for all, I want the record with the latest date. Any record (with same ID) with result ‘Y’ in an earlier date should be selected over a record with result ‘N’ in a later date. The output should have only one record for each ID in each year. See below:

Want
id dt Year test result
ABCDE 20101025 2010 T1 Y
ABCDE 20110909 2011 T1 Y
ABCUK 20150301 2015 T2 Y
ABCGH 20190815 2019 T2
ABCPK 20131021 2013 T1

I have tried
Price sql;
Create table want as
Select*
From have
Group by id
Having dt=max(dt);
Quit;

But it does not give the output based on my specific requirements.

Thank you for your help!
5 REPLIES 5
Amir
PROC Star

Hi,

 

Are you looking for something like the following? If not please clarify what should be different.

 

/* sort in required order */
proc sort data = have
          out  = have_sorted;

  by id year result dt;
run;


/* obtain last record per id & year */
data want;
  set have_sorted;
  
  by id year result dt;
  
  if last.year;
run;

 

 

Thanks & kind regards,

Amir.

PaigeMiller
Diamond | Level 26

@SASPreK wrote:

I have tried
Price sql;
Create table want as
Select*
From have
Group by id
Having dt=max(dt);
Quit;

But it does not give the output based on my specific requirements.

PRICE SQL ??? That won't work.

 

Also, you didn't tell SQL to do this for each year. How about this?

 

proc sql;
    create table want as
    select *
    from have
    group by id,year
    having dt=max(dt);
quit;

 

 

--
Paige Miller
Ksharp
Super User

Data have;
input id $  dt:yymmdd10. year  test $  result $;
format dt  date11.;
datalines;
ABCDE   20100429 2010   T1   N 
ABCDE   20100712 2010   T1   N
ABCDE   20101025 2010   T1   Y
ABCDE   20110509 2011   T1   Y
ABCDE   20110909 2011   T1   Y
ABCUK   20150301 2015   T2   Y
ABCUK   20150801 2015   T1   N
ABCUK   20150914 2015   T1   N
ABCGH   20190613 2019   T1   N
ABCGH   20190815 2019   T2   N
ABCPK   20130409 2013   T1   N
ABCPK   20131021 2013   T1   N
;
proc sql;
create table part1 as
select * from have where result='Y' group by id,year having dt=max(dt);
create table part2 as
select * from have where catx('|',id,year) not in ( select catx('|',id,year)  from part1) 
 group by id,year having dt=max(dt);
create table want as
select * from part1 union select * from part2;
quit;
dxiao2017
Lapis Lazuli | Level 10

Hi @SASPreK ,your requirement should be done in several steps using different techniques, it would not be able to achieve in a single step through a single technique (e.g., I tried many times with retain and if.first /if.last techniques but failed and almost give up this question and then switched to other techniques). The steps basically include the follows: (a) produce the final results column according to your requirements, this can be done with proc transpose and array, and (b) output the last id and year row together with the final results column, this can be done though if.last  within a data step and then a sql join. The code and results are as follows.

data have;
input id $  dt:yymmdd10. year  test $  
      result $;
format dt  date11.;
datalines;
ABCDE   20100429 2010   T1   N 
ABCDE   20100712 2010   T1   N
ABCDE   20101025 2010   T1   Y
ABCDE   20110509 2011   T1   Y
ABCDE   20110909 2011   T1   Y
ABCUK   20150301 2015   T2   Y
ABCUK   20150801 2015   T1   N
ABCUK   20150914 2015   T1   N
ABCGH   20190613 2019   T1   N
ABCGH   20190815 2019   T2   N
ABCPK   20130409 2013   T1   N
ABCPK   20131021 2013   T1   N
;
run;
proc print data=have;run;

/*produce final results column according to
requirements, using transpose and array*/
proc sort data=have out=have1;
   by id dt;
run;
proc print data=have1;run;
proc transpose data=have1 out=have2;
   by id;
   var result;
run;
proc print data=have2;run;
data have3;
   set have2;
   array rst[5] $ col1-col5;
   do i=1 to 5;
      if rst[i]='Y' then result='Y';
   end;
run;
proc print data=have3;run;

/*output the last id and year row together
with the final results, using if.last in
a data step and sql join*/
data have4;
   set have1;
   by id year;
   if last.year;
   drop result;
run;
proc print data=have4;run;
proc sql;
select h4.*,
       h3.result      
   from have4 as h4 left join
        have3 as h3
   on h4.id=h3.id;
quit;

dxiao2017_0-1754999003863.png

dxiao2017_2-1754999065736.png

dxiao2017_4-1754999251516.png

 

dxiao2017_5-1754999310883.png

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1690 views
  • 1 like
  • 6 in conversation