BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JillChen0131
Fluorite | Level 6

I would like to find the last row where testdate is earlier than date within each group. Thanks

Data like below:

ID   TESTDATE      DATE

A   Nov 20, 2019   Nov 25, 2019

A   Nov 30, 2019   Nov 25, 2019

A   Dec 1, 2019     Nov 25, 2019

B   Nov 1, 2019    Nov 18, 2019

B   Nov 13, 2019   Nov 18, 2019

B   Nov 24, 2019   Nov 18, 2019

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If I understand correctly, and - I believe you confirmed - (1) data are sorted by id/testdate, (2) each id has a constant DATE, then you merely need to look ahead one obs to see whether the current obs satisfies the testdate<date condition and the next obs violates the condition:

 

data have;
input ID $  (TESTDATE DATE) (:date9.);
format TESTDATE DATE yymmdd10.;
datalines;
A   20nov2019  25nov2019
A   30nov2019  25nov2019
A   01dec2019  25nov2019
B   01nov2019  18nov2019
B   13nov2019  18nov2019
B   24nov2019  18nov2019
;
data want (drop=nxt_:);
  set have (keep=id);
  by id;
  merge have have (firstobs=2 keep=testdate rename=(testdate=nxt_testdate));
  if testdate<date and (last.id=1 or nxt_testdate>=date) then flag=1;
  else flag=0;
run;

By looking ahead (using firstobs=2 in the second argument of the MERGE statement), you avoid the need to re-merge.

 

Now if you don't have a constant DATE value for a given ID, then a simple lookahead doesn't work.  Instead you have to read each ID twice - the first time to scan all obs and identify the last (physically last) record satisfying the condition.  The second time to output the results with flag=1 in the correct observation.  In this case the data need to be grouped only by ID, no secondary sort key needed:

data want (drop=_:);
  set have (in=firstpass) have (in=secondpass);
  by id;
  _nf+firstpass;
  retain _last_qualifier;
  if firstpass and testdate<date then _last_qualifier=_nf ;
  if secondpass;
  _ns+1;
  flag=(_ns=_last_qualifier);
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Is date always the same for each record in an ID? Is the testdate always sorted in chronological order? Is "last row" the same as highest value of testdate?

 

If so

 

data want;
    set have(where=(testdate<date));
    by id;
    if last.id;
run;

If those assumptions are not valid, then please explain the organization of the data in a lot more detail.

--
Paige Miller
JillChen0131
Fluorite | Level 6

answer are all "YES" for your questions. What if I want to flag that record in the original dataset? Should I just merge the new dataset and original? I came out this way when I was doing the programming. Is there any other way without creating a new dataset and merge back?

PaigeMiller
Diamond | Level 26

Original data set needs a flag. 🏁 Okay

 

Yes, you could merge the data set from my code back in with the original data set (and then optionally delete the data set from my code). Or this:

 

proc sql;
    create table want as select a.*,a.testdate=max(b.testdate) as flag
    from have as a left join have(where=(testdate<date)) as b on a.id=b.id and a.testdate=b.testdate
    group by b.id
    order by a.id,a.testdate;
quit;

 

 

--
Paige Miller
Patrick
Opal | Level 21

Or via a Sort and data step - but I'd be using the SQL as proposed by @PGStats 

data have;
input ID $  (TESTDATE DATE) (:date9.);
format TESTDATE DATE yymmdd10.;
datalines;
A   20nov2019  25nov2019
A   30nov2019  25nov2019
A   01dec2019  25nov2019
B   01nov2019  18nov2019
B   13nov2019  18nov2019
B   24nov2019  18nov2019
;

proc sort data=have(where=(testdate<date)) out=want;
 by id date testdate;
run;

data want;
  set want;
  by id;
  if last.date;
run;
PaigeMiller
Diamond | Level 26

@Patrick wrote:

Or via a Sort and data step - but I'd be using the SQL as proposed by @PGStats 

data have;
input ID $  (TESTDATE DATE) (:date9.);
format TESTDATE DATE yymmdd10.;
datalines;
A   20nov2019  25nov2019
A   30nov2019  25nov2019
A   01dec2019  25nov2019
B   01nov2019  18nov2019
B   13nov2019  18nov2019
B   24nov2019  18nov2019
;

proc sort data=have(where=(testdate<date)) out=want;
 by id date testdate;
run;

data want;
  set want;
  by id;
  if last.date;
run;

Already proposed, and furthermore the OP states that she wants all the records in the original data set and a new variable flag to indicate which is 1 if this is the record of interest and 0 otherwise.

--
Paige Miller
PGStats
Opal | Level 21

Use the auto-remerge feature of proc SQL to do this with a single query:

 

data have;
input ID $  (TESTDATE DATE) (:date9.);
format TESTDATE DATE yymmdd10.;
datalines;
A   20nov2019  25nov2019
A   30nov2019  25nov2019
A   01dec2019  25nov2019
B   01nov2019  18nov2019
B   13nov2019  18nov2019
B   24nov2019  18nov2019
;

proc sql;
create table want as
select * 
from have
where testdate lt date
group by id
having testdate = max(testdate);
select * from want;
quit;

PGStats_0-1646427950044.png

 

PG
mkeintz
PROC Star

If I understand correctly, and - I believe you confirmed - (1) data are sorted by id/testdate, (2) each id has a constant DATE, then you merely need to look ahead one obs to see whether the current obs satisfies the testdate<date condition and the next obs violates the condition:

 

data have;
input ID $  (TESTDATE DATE) (:date9.);
format TESTDATE DATE yymmdd10.;
datalines;
A   20nov2019  25nov2019
A   30nov2019  25nov2019
A   01dec2019  25nov2019
B   01nov2019  18nov2019
B   13nov2019  18nov2019
B   24nov2019  18nov2019
;
data want (drop=nxt_:);
  set have (keep=id);
  by id;
  merge have have (firstobs=2 keep=testdate rename=(testdate=nxt_testdate));
  if testdate<date and (last.id=1 or nxt_testdate>=date) then flag=1;
  else flag=0;
run;

By looking ahead (using firstobs=2 in the second argument of the MERGE statement), you avoid the need to re-merge.

 

Now if you don't have a constant DATE value for a given ID, then a simple lookahead doesn't work.  Instead you have to read each ID twice - the first time to scan all obs and identify the last (physically last) record satisfying the condition.  The second time to output the results with flag=1 in the correct observation.  In this case the data need to be grouped only by ID, no secondary sort key needed:

data want (drop=_:);
  set have (in=firstpass) have (in=secondpass);
  by id;
  _nf+firstpass;
  retain _last_qualifier;
  if firstpass and testdate<date then _last_qualifier=_nf ;
  if secondpass;
  _ns+1;
  flag=(_ns=_last_qualifier);
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 5183 views
  • 1 like
  • 5 in conversation