I would like to find the last row where testdate is earlier than date within each group. Thanks
Data like below:
ID TESTDATE DATE
A Nov 20, 2019 Nov 25, 2019
A Nov 30, 2019 Nov 25, 2019
A Dec 1, 2019 Nov 25, 2019
B Nov 1, 2019 Nov 18, 2019
B Nov 13, 2019 Nov 18, 2019
B Nov 24, 2019 Nov 18, 2019
If I understand correctly, and - I believe you confirmed - (1) data are sorted by id/testdate, (2) each id has a constant DATE, then you merely need to look ahead one obs to see whether the current obs satisfies the testdate<date condition and the next obs violates the condition:
data have;
input ID $ (TESTDATE DATE) (:date9.);
format TESTDATE DATE yymmdd10.;
datalines;
A 20nov2019 25nov2019
A 30nov2019 25nov2019
A 01dec2019 25nov2019
B 01nov2019 18nov2019
B 13nov2019 18nov2019
B 24nov2019 18nov2019
;
data want (drop=nxt_:);
set have (keep=id);
by id;
merge have have (firstobs=2 keep=testdate rename=(testdate=nxt_testdate));
if testdate<date and (last.id=1 or nxt_testdate>=date) then flag=1;
else flag=0;
run;
By looking ahead (using firstobs=2 in the second argument of the MERGE statement), you avoid the need to re-merge.
Now if you don't have a constant DATE value for a given ID, then a simple lookahead doesn't work. Instead you have to read each ID twice - the first time to scan all obs and identify the last (physically last) record satisfying the condition. The second time to output the results with flag=1 in the correct observation. In this case the data need to be grouped only by ID, no secondary sort key needed:
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
by id;
_nf+firstpass;
retain _last_qualifier;
if firstpass and testdate<date then _last_qualifier=_nf ;
if secondpass;
_ns+1;
flag=(_ns=_last_qualifier);
run;
Is date always the same for each record in an ID? Is the testdate always sorted in chronological order? Is "last row" the same as highest value of testdate?
If so
data want;
set have(where=(testdate<date));
by id;
if last.id;
run;
If those assumptions are not valid, then please explain the organization of the data in a lot more detail.
answer are all "YES" for your questions. What if I want to flag that record in the original dataset? Should I just merge the new dataset and original? I came out this way when I was doing the programming. Is there any other way without creating a new dataset and merge back?
Original data set needs a flag. 🏁 Okay
Yes, you could merge the data set from my code back in with the original data set (and then optionally delete the data set from my code). Or this:
proc sql;
create table want as select a.*,a.testdate=max(b.testdate) as flag
from have as a left join have(where=(testdate<date)) as b on a.id=b.id and a.testdate=b.testdate
group by b.id
order by a.id,a.testdate;
quit;
Or via a Sort and data step - but I'd be using the SQL as proposed by @PGStats
data have;
input ID $ (TESTDATE DATE) (:date9.);
format TESTDATE DATE yymmdd10.;
datalines;
A 20nov2019 25nov2019
A 30nov2019 25nov2019
A 01dec2019 25nov2019
B 01nov2019 18nov2019
B 13nov2019 18nov2019
B 24nov2019 18nov2019
;
proc sort data=have(where=(testdate<date)) out=want;
by id date testdate;
run;
data want;
set want;
by id;
if last.date;
run;
@Patrick wrote:
Or via a Sort and data step - but I'd be using the SQL as proposed by @PGStats
data have; input ID $ (TESTDATE DATE) (:date9.); format TESTDATE DATE yymmdd10.; datalines; A 20nov2019 25nov2019 A 30nov2019 25nov2019 A 01dec2019 25nov2019 B 01nov2019 18nov2019 B 13nov2019 18nov2019 B 24nov2019 18nov2019 ; proc sort data=have(where=(testdate<date)) out=want; by id date testdate; run; data want; set want; by id; if last.date; run;
Already proposed, and furthermore the OP states that she wants all the records in the original data set and a new variable flag to indicate which is 1 if this is the record of interest and 0 otherwise.
Use the auto-remerge feature of proc SQL to do this with a single query:
data have;
input ID $ (TESTDATE DATE) (:date9.);
format TESTDATE DATE yymmdd10.;
datalines;
A 20nov2019 25nov2019
A 30nov2019 25nov2019
A 01dec2019 25nov2019
B 01nov2019 18nov2019
B 13nov2019 18nov2019
B 24nov2019 18nov2019
;
proc sql;
create table want as
select *
from have
where testdate lt date
group by id
having testdate = max(testdate);
select * from want;
quit;
If I understand correctly, and - I believe you confirmed - (1) data are sorted by id/testdate, (2) each id has a constant DATE, then you merely need to look ahead one obs to see whether the current obs satisfies the testdate<date condition and the next obs violates the condition:
data have;
input ID $ (TESTDATE DATE) (:date9.);
format TESTDATE DATE yymmdd10.;
datalines;
A 20nov2019 25nov2019
A 30nov2019 25nov2019
A 01dec2019 25nov2019
B 01nov2019 18nov2019
B 13nov2019 18nov2019
B 24nov2019 18nov2019
;
data want (drop=nxt_:);
set have (keep=id);
by id;
merge have have (firstobs=2 keep=testdate rename=(testdate=nxt_testdate));
if testdate<date and (last.id=1 or nxt_testdate>=date) then flag=1;
else flag=0;
run;
By looking ahead (using firstobs=2 in the second argument of the MERGE statement), you avoid the need to re-merge.
Now if you don't have a constant DATE value for a given ID, then a simple lookahead doesn't work. Instead you have to read each ID twice - the first time to scan all obs and identify the last (physically last) record satisfying the condition. The second time to output the results with flag=1 in the correct observation. In this case the data need to be grouped only by ID, no secondary sort key needed:
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
by id;
_nf+firstpass;
retain _last_qualifier;
if firstpass and testdate<date then _last_qualifier=_nf ;
if secondpass;
_ns+1;
flag=(_ns=_last_qualifier);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.