- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have an issue that I think might work best with a do until loop. I have patient data for multiple hospital admissions and I want to keep all the records up until and including their first positive test, but delete the records after the first positive record.
Here's the data I have:
| repid | date | postive |
| 125 | 1/12/2014 | 0 |
| 125 | 1/15/2014 | 1 |
| 1447 | 4/12/2013 | 0 |
| 1447 | 11/12/2013 | 0 |
| 1447 | 5/6/2014 | 0 |
| 1447 | 6/7/2014 | 0 |
| 1447 | 11/20/2014 | 1 |
| 1447 | 1/10/2015 | 0 |
| 1447 | 3/15/2015 | 0 |
| 88459 | 1/6/2014 | 0 |
| 454878 | 12/8/2014 | 0 |
| 316 | 2/9/2014 | 0 |
| 316 | 3/18/2014 | 0 |
| 316 | 6/2/2014 | 0 |
| 316 | 9/28/2014 | 1 |
| 316 | 12/1/2014 | 0 |
Want:
| repid | date | tive |
| 125 | 1/12/2014 | 0 |
| 125 | 1/15/2014 | 1 |
| 1447 | 4/12/2013 | 0 |
| 1447 | 11/12/2013 | 0 |
| 1447 | 5/6/2014 | 0 |
| 1447 | 6/7/2014 | 0 |
| 1447 | 11/20/2014 | 1 |
| 88459 | 1/6/2014 | 0 |
| 454878 | 12/8/2014 | 0 |
| 316 | 2/9/2014 | 0 |
| 316 | 3/18/2014 | 0 |
| 316 | 6/2/2014 | 0 |
| 316 | 9/28/2014 | 1 |
Thanks,
Sarah
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Maybe not the DO UNTIL you had in mind...
data have;
input repid date :mmddyy. positive;
format date yymmdd10.;
datalines;
125 1/12/2014 0
125 1/15/2014 1
1447 4/12/2013 0
1447 11/12/2013 0
1447 5/6/2014 0
1447 6/7/2014 0
1447 11/20/2014 1
1447 1/10/2015 0
1447 3/15/2015 0
88459 1/6/2014 0
454878 12/8/2014 0
316 2/9/2014 0
316 3/18/2014 0
316 6/2/2014 0
316 9/28/2014 1
316 12/1/2014 0
;
proc sort data=have; by repid date; run;
data want;
do until(last.repid);
set have; by repid;
if not pos then output;
if positive then pos = 1;
end;
drop pos;
run;
proc print data=want noobs; run;
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Maybe not the DO UNTIL you had in mind...
data have;
input repid date :mmddyy. positive;
format date yymmdd10.;
datalines;
125 1/12/2014 0
125 1/15/2014 1
1447 4/12/2013 0
1447 11/12/2013 0
1447 5/6/2014 0
1447 6/7/2014 0
1447 11/20/2014 1
1447 1/10/2015 0
1447 3/15/2015 0
88459 1/6/2014 0
454878 12/8/2014 0
316 2/9/2014 0
316 3/18/2014 0
316 6/2/2014 0
316 9/28/2014 1
316 12/1/2014 0
;
proc sort data=have; by repid date; run;
data want;
do until(last.repid);
set have; by repid;
if not pos then output;
if positive then pos = 1;
end;
drop pos;
run;
proc print data=want noobs; run;
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks PGStats!
Do you mind walking me through the code a bit? the last.repid takes the last observation for that repid and outputs all the pos=0 until it gets to positive=1. Why do we need to make the variable 'pos'? This seems like an extra step. Could we have the following code? If not, why not?
do until (last.repid);
set have;
by repid;
if positive ^=1 then output;
end;
Also, why are the set and by statements inside of the do loop?
Thanks for you help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Variable pos only takes two values: missing or 1. It is set to missing at the beginning of every iteration of the implicit datastep loop. In a logical test, a missing value is taken as FALSE. Thus every record in a repid group is output until after pos is assigned a value, i.e. after positive=1 has been met. Once it is assigned a value, variable pos only goes back to missing at the beginning of the next repid group.
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, this is very helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have one more question about the last.repid function. I've added 4 more observations to the dataset:
| 412 | 11/4/2009 | 0 |
| 412 | 4/5/2013 | 1 |
| 412 | 3/14/2014 | 1 |
| 412 | 1/2/2015 | 0 |
I want to keep only the first 2 observations since I'm only interested in the volunteers until they become positive. I've used the same code you gave me and it does what I want. However, I thought last.repid would start from the bottom, work up the rows, giving me the last 2 observations. Why doesn't it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Unless you use the point= option, the set statement reads dataset observations one by one in the order they are stored. last.repid is a variable created by SAS that takes the value of 1 (true) when the next record in the dataset is not from the same repid group.
Read the documentation about first. and last. automatic variables to understand what's going on.
SAS(R) 9.4 Language Reference: Concepts, Fourth Edition
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input repid date :mmddyy. positive;
format date yymmdd10.;
datalines;
125 1/12/2014 0
125 1/15/2014 1
1447 4/12/2013 0
1447 11/12/2013 0
1447 5/6/2014 0
1447 6/7/2014 0
1447 11/20/2014 1
1447 1/10/2015 0
1447 3/15/2015 0
88459 1/6/2014 0
454878 12/8/2014 0
316 2/9/2014 0
316 3/18/2014 0
316 6/2/2014 0
316 9/28/2014 1
316 12/1/2014 0
;
data want;
do _n_=1 by 1 until(last.repid);
set have;
by repid notsorted;
if positive then n=_n_;
end;
do _n_=1 by 1 until(last.repid);
set have;
by repid notsorted;
if _n_<=n or n=. then output;
end;
drop n;
run;
Regards,
Naveen Srinivasan
L&T Infotech