BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AJ_Brien
Quartz | Level 8

Hello,

 

I have a dataset such that for each value of 'acc', I want to select the row with the highest 'DT' value where flag='Y'. Below is the input dataset:

 

FormIDDTaccflag
60663233221001Oct2019 0:00:00.000testY
60699093221003Oct2019 0:00:00.000test 
60699143221003Oct2019 0:00:00.000test 
60699213221003Oct2019 0:00:00.000testY
60699983221003Oct2019 0:00:00.000test 
60699998221003Oct2019 0:00:00.000test 
60700023221003Oct2019 0:00:00.000test 
61120139231712Nov2019 0:00:00.000test1Y

 

This is what I've come up with so far:

 

proc sql;
create table abc as
select * 
from sample where flag = 'Y' having dt = max(dt);
quit;

But this gives me the following output:

 

FormIDDTaccflag
61120139231712Nov2019 0:00:00.000test1Y

 

But I'm trying to get the following output: I don't need the max value of DT for the entire table, but looking to get the max DT value per acc.

FormIDDTaccflag
60699213221003Oct2019 0:00:00.000testY
61120139231712Nov2019 0:00:00.000test1Y

 

Appreciate your help and time 🙂

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Add a 

group by acc;
proc sql;
create table abc as
select * 
from sample where flag = 'Y' 
group by acc
having dt = max(dt);
quit;

 

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

I don't understand this. Acc is constand besides one obs?

AJ_Brien
Quartz | Level 8
so diff forms are submitted for a certain 'acc' value on different dates 'dt', hence you see multiple entries with different form values for the same account but on different dates. So I need to look for the most recent form submission for every acc value. hope this provides some context.
novinosrin
Tourmaline | Level 20

Add a 

group by acc;
proc sql;
create table abc as
select * 
from sample where flag = 'Y' 
group by acc
having dt = max(dt);
quit;

 

AJ_Brien
Quartz | Level 8
I tried that earlier, that gives an error:

ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, *, **, +, -, /, <, <=, <>, =, >, >=, AND, EQ, EQT, EXCEPT,
GE, GET, GT, GTT, INTERSECT, LE, LET, LT, LTT, NE, NET, NOT, OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.

ERROR 202-322: The option or parameter is not recognized and will be ignored.
novinosrin
Tourmaline | Level 20

The SQL clauses are as follows:

 

1. SELECT- MUST

2. FROM-MUST

3. WHERE (IF ANY)

4. GROUP BY (IF ANY)

5. HAVING (IF A GROUPED FILTER IS REQUIRED)

6. ORDER BY (IF YOU WANT TO ORDER)

 

Courtesy: @PGStats  aka prodigy genius stats who corrected me 2 years ago. I took those notes seriously after 🙂

 

Here is the test:

 



data have;
infile cards truncover;
input Form	ID @21	DT datetime21. acc $	flag $;
format dt datetime20.;
cards;
60663233    2210   	01Oct2019 0:00:00.000	test	Y
60699093	2210	03Oct2019 0:00:00.000	test	 
60699143	2210	03Oct2019 0:00:00.000	test	 
60699213	2210	03Oct2019 0:00:00.000	test	Y
60699983	2210	03Oct2019 0:00:00.000	test	 
60699998	2210	03Oct2019 0:00:00.000	test	 
60700023	2210	03Oct2019 0:00:00.000	test	 
61120139	2317	12Nov2019 0:00:00.000	test1	Y
;

proc sql;
create table abc as
select * 
from have 
where flag = 'Y' 
group by acc
having dt = max(dt);
quit;

proc print noobs;run;
Form ID DT acc flag
60699213 2210 03OCT2019:00:00:00 test Y
61120139 2317 12NOV2019:00:00:00 test1 Y

 

 

AJ_Brien
Quartz | Level 8
this is excellent piece, I will note it down 🙂
thank you!
AJ_Brien
Quartz | Level 8
nevermind, I was adding group by after having instead of before. This works. Thank you!!
mkeintz
PROC Star

If your data are already sorted by ACC:

 

data abc;
  infile datalines missover;
  input Form ID DT :datetime21.3 acc :$5. flag :$1. ;
  format dt datetime24.3;
datalines;
60663233 2210 01Oct2019:0:00:00.000 test Y 
60699093 2210 03Oct2019:0:00:00.000 test   
60699143 2210 03Oct2019:0:00:00.000 test   
60699213 2210 03Oct2019:0:00:00.000 test Y 
60699983 2210 03Oct2019:0:00:00.000 test   
60699998 2210 03Oct2019:0:00:00.000 test   
60700023 2210 03Oct2019:0:00:00.000 test   
61120139 2317 12Nov2019:0:00:00.000 test1 Y 
run;

data want (drop=_:);
  do until (last.acc);
    set abc;
    where flag='Y';
    by acc;
    _maxdt=max(dt,_maxdt);
  end;
  do until (last.acc);
    set abc;
    where flag='Y';
    by acc;
    if dt=_maxdt then put (_all_) (=); *output;
  end;
run;

This program doesn't bother reading in observations unless it has flag='Y'.  The where flag="Y";  statements outsource the filtering process to the data engine, so only the qualifyi8ng records are seen by the data step.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hashman
Ammonite | Level 13

@AJ_Brien:

A non-SQL way you can approach this with depends on whether your input is sorted by ACC or not. If it is sorted:

data have ;                                                                                                                             
  input form id dt :datetime18. acc $ flag $ ;                                                                                          
  format dt datetime. ;                                                                                                                 
  cards ;                                                                                                                               
60663233  2210  01Oct2019:00:00:00  test   Y                                                                                            
60699093  2210  03Oct2019:00:00:00  test   .                                                                                            
60699143  2210  03Oct2019:00:00:00  test   .                                                                                            
60699213  2210  03Oct2019:00:00:00  test   Y                                                                                            
60699983  2210  03Oct2019:00:00:00  test   .                                                                                            
60699998  2210  03Oct2019:00:00:00  test   .                                                                                            
60700023  2210  03Oct2019:00:00:00  test   .                                                                                            
61120139  2317  12Nov2019:00:00:00  test1  Y                                                                                            
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want ;                                                                                                                             
  do until (last.acc) ;                                                                                                                 
    set have (where=(flag="Y")) curobs = _q ;                                                                                           
    by acc ;                                                                                                                            
    if dt < _n_ then continue ;                                                                                                         
    _n_ = dt ;                                                                                                                          
    _s  = _q ;                                                                                                                          
  end ;                                                                                                                                 
  set have point = _s ;                                                                                                                 
run ;               

If it is not sorted:

data want ;                                                                                                                             
  dcl hash h () ;                                                                                                                       
  h.definekey ("acc") ;                                                                                                                 
  h.definedata ("_n_", "_s") ;                                                                                                          
  h.definedone () ;                                                                                                                     
  do until (z) ;                                                                                                                        
    set have curobs = _q end = z ;                                                                                                      
    where flag = "Y" ;                                                                                                                  
    if h.find() ne 0 then call missing (_n_, _s) ;                                                                                      
    if dt <= _n_ then continue ;                                                                                                        
    _n_ = dt ;                                                                                                                          
    _s  = _q ;                                                                                                                          
    h.replace() ;                                                                                                                       
  end ;                                                                                                                                 
  dcl hiter hi ("h") ;                                                                                                                  
  do while (hi.next() = 0) ;                                                                                                            
    set have point = _s ;                                                                                                               
    output ;                                                                                                                            
  end ;                                                                                                                                 
run ;                     

In both  cases, if there's a tie between two "Y" records with the same maximal date, the first occurrence is selected. Note that self-merging SQL offered by @novinosrin won't handle this and instead return all the tied records.

 

Kind regards

PaulD.

novinosrin
Tourmaline | Level 20

Guru @hashman   Yes fully agree and point well taken. Yet another exuberance of "attention to detail". Kudos!

 

@AJ_Brien  

 

My thoughts  & assumptions:

 

1. if your dataset doesn't have any more variables than the ones listed and the values are truly representative of your real, select distinct * should handle the duplicates and will pick only instance

2. Point 1 essentially means you could have extra records of max dt and relies on this consistency. Strictly speaking, if even FORMNO changes for the same records that tied with the max dates, the select distinct * will of course fail.

3. If what Hashman pointed happens to be true, his solution is bullet proof and robust.

 

Hashman's diligence should have opened your eyes too to carefully examine the ties and associated variables in a record. Please make sure you keep note of it. 

 

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2564 views
  • 3 likes
  • 5 in conversation