Hello,
I have a dataset that lists records by ID of exam date and result. I only want to output the data after an ID's second last 'PASS'. How do I achieve this?
Many thanks.
Max
Source data:
ID DATE RESULT
85 15JAN2010 FAIL
85 25JAN2010 PASS
85 15FEB2011 FAIL
85 25FEB2011 PASS
85 01MAR2011 FAIL
85 07MAR2011 FAIL
85 10MAR2011 PASS
66 10OCT2015 FAIL
66 15OCT2015 PASS
66 10SEP2016 FAIL
66 15SEP2016 PASS
Desired output data:
85 01MAR2011 FAIL
85 07MAR2011 FAIL
85 10MAR2011 PASS
66 10SEP2016 FAIL
66 15SEP2016 PASS
Your data appears to be sorted by date for each ID, but not sorted by ID. You can get what you need without further sorting of the data:
data have;
informat id $2. date date9. status $12.;
format date date9.;
input id date status $;
cards;
85 15JAN2010 FAIL
85 25JAN2010 PASS
85 15FEB2011 FAIL
85 25FEB2011 PASS
85 01MAR2011 FAIL
85 07MAR2011 FAIL
85 10MAR2011 PASS
66 10OCT2015 FAIL
66 15OCT2015 PASS
66 10SEP2016 FAIL
66 15SEP2016 PASS
;
run;
data want (drop=npass cutpass);
/* Count total number of "PASS" records in the group */
do npass=0 by 0 until (last.id);
set have;
by id notsorted;
if status='PASS' then npass=npass+1;
end;
cutpass=npass-1;
/* Reread,, but output only starting with record after NPASS reaches CUTPASS*/
do npass=0 by 0 until (last.id);
set have;
by id notsorted;
if npass>=cutpass then output;
/* if npass>=cutpass>0 then output; */
if status="PASS" then npass=npass+1;
end;
run;
@PGStats pointed out that my suggestion would output groups in which there existed no second-to-last "PASS" record. Excellent observation, and easily corrected. Using the commented "if npass ..." line instead of the original would address that issue.
Regards,
Mark
If you reverse the order of your data, ID by date descending, you can take records from the first until the second pass which is a much easier logical problem to solve.
Can you provide a sample code? Thanks.
data have;
informat id $2. date date9. status $12.;
format date date9.;
input id date status $;
cards;
85 15JAN2010 FAIL
85 25JAN2010 PASS
85 15FEB2011 FAIL
85 25FEB2011 PASS
85 01MAR2011 FAIL
85 07MAR2011 FAIL
85 10MAR2011 PASS
66 10OCT2015 FAIL
66 15OCT2015 PASS
66 10SEP2016 FAIL
66 15SEP2016 PASS
;
run;
proc sort data=have;
by id descending date;
run;
data want;
set have;
by id descending date;
retain flag;
if first.id then flag=0;
else if status='PASS' then flag=1;
if flag=0 then output;
run;
proc sort data=want;
by id date;
run;
proc print; run;
A single step solution:
data want;
do until (last.id);
set have(where=(status="PASS")); by id notsorted;
if not last.id then passDate = date;
end;
thisId = id;
do until (id = thisId and last.id);
set have; by id notsorted;
if id = thisId and -date < -passDate then output;
end;
drop thisId passDate;
run;
proc print noobs; run;
Note that this code skips Ids when all records for an Id are FAILs or there is a single PASS record. Other solutions proposed by @Reeza and @mkeintz output all obs in those cases.
Your data appears to be sorted by date for each ID, but not sorted by ID. You can get what you need without further sorting of the data:
data have;
informat id $2. date date9. status $12.;
format date date9.;
input id date status $;
cards;
85 15JAN2010 FAIL
85 25JAN2010 PASS
85 15FEB2011 FAIL
85 25FEB2011 PASS
85 01MAR2011 FAIL
85 07MAR2011 FAIL
85 10MAR2011 PASS
66 10OCT2015 FAIL
66 15OCT2015 PASS
66 10SEP2016 FAIL
66 15SEP2016 PASS
;
run;
data want (drop=npass cutpass);
/* Count total number of "PASS" records in the group */
do npass=0 by 0 until (last.id);
set have;
by id notsorted;
if status='PASS' then npass=npass+1;
end;
cutpass=npass-1;
/* Reread,, but output only starting with record after NPASS reaches CUTPASS*/
do npass=0 by 0 until (last.id);
set have;
by id notsorted;
if npass>=cutpass then output;
/* if npass>=cutpass>0 then output; */
if status="PASS" then npass=npass+1;
end;
run;
@PGStats pointed out that my suggestion would output groups in which there existed no second-to-last "PASS" record. Excellent observation, and easily corrected. Using the commented "if npass ..." line instead of the original would address that issue.
Regards,
Mark
Thanks so much for everyone who responded to my post! Thumbs up to SAS Communities!
Max
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.