BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sarahsasuser
Quartz | Level 8

I have an issue that I think might work best with a do until loop. I have patient data for multiple hospital admissions and I want to keep all the records up until and including their first positive test, but delete the records after the first positive record.

Here's the data I have:

repid datepostive
1251/12/20140
1251/15/20141
14474/12/20130
144711/12/20130
14475/6/20140
14476/7/20140
144711/20/20141
14471/10/20150
14473/15/20150
884591/6/20140
45487812/8/20140
3162/9/20140
3163/18/20140
3166/2/20140
3169/28/20141
31612/1/20140

Want:

repid datetive
1251/12/20140
1251/15/20141
14474/12/20130
144711/12/20130
14475/6/20140
14476/7/20140
144711/20/20141
884591/6/20140
45487812/8/20140
3162/9/20140
3163/18/20140
3166/2/20140
3169/28/20141

Thanks,

Sarah

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Maybe not the DO UNTIL you had in mind...

data have;

input repid date :mmddyy. positive;

format date yymmdd10.;

datalines;

125 1/12/2014 0

125 1/15/2014 1

1447 4/12/2013 0

1447 11/12/2013 0

1447 5/6/2014 0

1447 6/7/2014 0

1447 11/20/2014 1

1447 1/10/2015 0

1447 3/15/2015 0

88459 1/6/2014 0

454878 12/8/2014 0

316 2/9/2014 0

316 3/18/2014 0

316 6/2/2014 0

316 9/28/2014 1

316 12/1/2014 0

;

proc sort data=have; by repid date; run;

data want;

do until(last.repid);

    set have; by repid;

    if not pos then output;

    if positive then pos = 1;

    end;

drop pos;

run;

proc print data=want noobs; run;

PG

PG

View solution in original post

7 REPLIES 7
PGStats
Opal | Level 21

Maybe not the DO UNTIL you had in mind...

data have;

input repid date :mmddyy. positive;

format date yymmdd10.;

datalines;

125 1/12/2014 0

125 1/15/2014 1

1447 4/12/2013 0

1447 11/12/2013 0

1447 5/6/2014 0

1447 6/7/2014 0

1447 11/20/2014 1

1447 1/10/2015 0

1447 3/15/2015 0

88459 1/6/2014 0

454878 12/8/2014 0

316 2/9/2014 0

316 3/18/2014 0

316 6/2/2014 0

316 9/28/2014 1

316 12/1/2014 0

;

proc sort data=have; by repid date; run;

data want;

do until(last.repid);

    set have; by repid;

    if not pos then output;

    if positive then pos = 1;

    end;

drop pos;

run;

proc print data=want noobs; run;

PG

PG
sarahsasuser
Quartz | Level 8

Thanks PGStats!

Do you mind walking me through the code a bit? the last.repid takes the last observation for that repid and outputs all the pos=0 until it gets to positive=1. Why do we need to make the variable 'pos'? This seems like an extra step. Could we have the following code? If not, why not?

do until (last.repid);

     set have;

     by repid;

     if positive ^=1 then output;

end;

Also, why are the set and by statements inside of the do loop?

Thanks for you help!

PGStats
Opal | Level 21

Variable pos only takes two values: missing or 1. It is set to missing at the beginning of every iteration of the implicit datastep loop. In a logical test, a missing value is taken as FALSE. Thus every record in a repid group is output until after pos is assigned a value, i.e. after positive=1 has been met. Once it is assigned a value, variable pos only goes back to missing at the beginning of the next repid group.

PG

PG
sarahsasuser
Quartz | Level 8

Thanks, this is very helpful.

sarahsasuser
Quartz | Level 8

I have one more question about the last.repid function. I've added 4 more observations to the dataset:

41211/4/20090
4124/5/20131
4123/14/20141
4121/2/20150

I want to keep only the first 2 observations since I'm only interested in the volunteers until they become positive. I've used the same code you gave me and it does what I want. However, I thought last.repid would start from the bottom, work up the rows, giving me the last 2 observations. Why doesn't it?

PGStats
Opal | Level 21

Unless you use the point= option, the set statement reads dataset observations one by one in the order they are stored. last.repid is a variable created by SAS that takes the value of 1 (true) when the next record in the dataset is not from the same repid group.

Read the documentation about first. and last. automatic variables to understand what's going on.

SAS(R) 9.4 Language Reference: Concepts, Fourth Edition

PG

PG
naveen_srini
Quartz | Level 8

data have;

input repid date :mmddyy. positive;

format date yymmdd10.;

datalines;

125 1/12/2014 0

125 1/15/2014 1

1447 4/12/2013 0

1447 11/12/2013 0

1447 5/6/2014 0

1447 6/7/2014 0

1447 11/20/2014 1

1447 1/10/2015 0

1447 3/15/2015 0

88459 1/6/2014 0

454878 12/8/2014 0

316 2/9/2014 0

316 3/18/2014 0

316 6/2/2014 0

316 9/28/2014 1

316 12/1/2014 0

;

data want;

do _n_=1 by 1 until(last.repid);

set have;

by repid notsorted;

if positive    then n=_n_;

end; 

do _n_=1 by 1 until(last.repid);

set have;

by repid notsorted;

if _n_<=n or n=. then output;

end;

drop n;

run;

Regards,

Naveen Srinivasan

L&T Infotech

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 8827 views
  • 6 likes
  • 3 in conversation