I have an issue that I think might work best with a do until loop. I have patient data for multiple hospital admissions and I want to keep all the records up until and including their first positive test, but delete the records after the first positive record.
Here's the data I have:
repid | date | postive |
125 | 1/12/2014 | 0 |
125 | 1/15/2014 | 1 |
1447 | 4/12/2013 | 0 |
1447 | 11/12/2013 | 0 |
1447 | 5/6/2014 | 0 |
1447 | 6/7/2014 | 0 |
1447 | 11/20/2014 | 1 |
1447 | 1/10/2015 | 0 |
1447 | 3/15/2015 | 0 |
88459 | 1/6/2014 | 0 |
454878 | 12/8/2014 | 0 |
316 | 2/9/2014 | 0 |
316 | 3/18/2014 | 0 |
316 | 6/2/2014 | 0 |
316 | 9/28/2014 | 1 |
316 | 12/1/2014 | 0 |
Want:
repid | date | tive |
125 | 1/12/2014 | 0 |
125 | 1/15/2014 | 1 |
1447 | 4/12/2013 | 0 |
1447 | 11/12/2013 | 0 |
1447 | 5/6/2014 | 0 |
1447 | 6/7/2014 | 0 |
1447 | 11/20/2014 | 1 |
88459 | 1/6/2014 | 0 |
454878 | 12/8/2014 | 0 |
316 | 2/9/2014 | 0 |
316 | 3/18/2014 | 0 |
316 | 6/2/2014 | 0 |
316 | 9/28/2014 | 1 |
Thanks,
Sarah
Maybe not the DO UNTIL you had in mind...
data have;
input repid date :mmddyy. positive;
format date yymmdd10.;
datalines;
125 1/12/2014 0
125 1/15/2014 1
1447 4/12/2013 0
1447 11/12/2013 0
1447 5/6/2014 0
1447 6/7/2014 0
1447 11/20/2014 1
1447 1/10/2015 0
1447 3/15/2015 0
88459 1/6/2014 0
454878 12/8/2014 0
316 2/9/2014 0
316 3/18/2014 0
316 6/2/2014 0
316 9/28/2014 1
316 12/1/2014 0
;
proc sort data=have; by repid date; run;
data want;
do until(last.repid);
set have; by repid;
if not pos then output;
if positive then pos = 1;
end;
drop pos;
run;
proc print data=want noobs; run;
PG
Maybe not the DO UNTIL you had in mind...
data have;
input repid date :mmddyy. positive;
format date yymmdd10.;
datalines;
125 1/12/2014 0
125 1/15/2014 1
1447 4/12/2013 0
1447 11/12/2013 0
1447 5/6/2014 0
1447 6/7/2014 0
1447 11/20/2014 1
1447 1/10/2015 0
1447 3/15/2015 0
88459 1/6/2014 0
454878 12/8/2014 0
316 2/9/2014 0
316 3/18/2014 0
316 6/2/2014 0
316 9/28/2014 1
316 12/1/2014 0
;
proc sort data=have; by repid date; run;
data want;
do until(last.repid);
set have; by repid;
if not pos then output;
if positive then pos = 1;
end;
drop pos;
run;
proc print data=want noobs; run;
PG
Thanks PGStats!
Do you mind walking me through the code a bit? the last.repid takes the last observation for that repid and outputs all the pos=0 until it gets to positive=1. Why do we need to make the variable 'pos'? This seems like an extra step. Could we have the following code? If not, why not?
do until (last.repid);
set have;
by repid;
if positive ^=1 then output;
end;
Also, why are the set and by statements inside of the do loop?
Thanks for you help!
Variable pos only takes two values: missing or 1. It is set to missing at the beginning of every iteration of the implicit datastep loop. In a logical test, a missing value is taken as FALSE. Thus every record in a repid group is output until after pos is assigned a value, i.e. after positive=1 has been met. Once it is assigned a value, variable pos only goes back to missing at the beginning of the next repid group.
PG
Thanks, this is very helpful.
I have one more question about the last.repid function. I've added 4 more observations to the dataset:
412 | 11/4/2009 | 0 |
412 | 4/5/2013 | 1 |
412 | 3/14/2014 | 1 |
412 | 1/2/2015 | 0 |
I want to keep only the first 2 observations since I'm only interested in the volunteers until they become positive. I've used the same code you gave me and it does what I want. However, I thought last.repid would start from the bottom, work up the rows, giving me the last 2 observations. Why doesn't it?
Unless you use the point= option, the set statement reads dataset observations one by one in the order they are stored. last.repid is a variable created by SAS that takes the value of 1 (true) when the next record in the dataset is not from the same repid group.
Read the documentation about first. and last. automatic variables to understand what's going on.
SAS(R) 9.4 Language Reference: Concepts, Fourth Edition
PG
data have;
input repid date :mmddyy. positive;
format date yymmdd10.;
datalines;
125 1/12/2014 0
125 1/15/2014 1
1447 4/12/2013 0
1447 11/12/2013 0
1447 5/6/2014 0
1447 6/7/2014 0
1447 11/20/2014 1
1447 1/10/2015 0
1447 3/15/2015 0
88459 1/6/2014 0
454878 12/8/2014 0
316 2/9/2014 0
316 3/18/2014 0
316 6/2/2014 0
316 9/28/2014 1
316 12/1/2014 0
;
data want;
do _n_=1 by 1 until(last.repid);
set have;
by repid notsorted;
if positive then n=_n_;
end;
do _n_=1 by 1 until(last.repid);
set have;
by repid notsorted;
if _n_<=n or n=. then output;
end;
drop n;
run;
Regards,
Naveen Srinivasan
L&T Infotech
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.