Hi all,
The dataset is like below with accounts undergoing treatment R. At some point they will switch to treatment S. Once they switch to S they will not switch back to R. I would like to identify 2 consecutive records where an ID had R and then received S
Note: Some id's will never move to S
ID | date | type |
99997887 | 05-May-20 | R |
99997887 | 04-Jun-20 | R |
99998960 | 02-Jun-20 | S |
99998960 | 03-Jul-20 | S |
99998960 | 03-Aug-20 | S |
100003216 | 14-Jun-20 | R |
100003216 | 15-Jul-20 | R |
100005355 | 05-May-20 | R |
100005355 | 04-Jun-20 | R |
100005355 | 05-Jul-20 | R |
100005981 | 03-Jun-20 | R |
100005981 | 03-Aug-20 | S |
100009933 | 10-May-20 | R |
100009933 | 10-Jul-20 | R |
100009933 | 09-Aug-20 | R |
100010577 | 22-May-20 | R |
100010577 | 22-Jul-20 | S |
Next is a fixed tested code:
data have;
infile datalines dlm='09'x;
input ID date :date9. type :$1. ;
format date date9. ;
datalines;
99997887 05-May-20 R
99997887 04-Jun-20 R
99998960 02-Jun-20 S
99998960 03-Jul-20 S
99998960 03-Aug-20 S
100003216 14-Jun-20 R
100003216 15-Jul-20 R
100005355 05-May-20 R
100005355 04-Jun-20 R
100005355 05-Jul-20 R
100005981 03-Jun-20 R
100005981 03-Aug-20 S
100009933 10-May-20 R
100009933 10-Jul-20 R
100009933 09-Aug-20 R
100010577 22-May-20 R
100010577 22-Jul-20 S
run;
data want;
set have;
by ID;
retain lag_type lag_date;
lag_type = lag(type); lag_date=lag(date);
if first.ID then do; lag_type=" "; lag_date=.; end;
if ID in (100005981 , 100010577 ) then
putlog ID= date= type= lag_type= lag_date;
if not first.ID and type='S' and lag_type='R' then do;
R_type = lag_type; R_date=lag_date;
S_type = type; S_date = date;
output;
keep ID R_: S_: ; end; /* line added */
run;
Do you want a report? Or a dataset? What do you want it to look like?
A dataset so i can further analysis.
so i would want all id's where there was an event R and then an event S after.
I only care about extracting 2 rows per ID this way. R would be my champion and S would be the challenger
Assuming that (1) the data are sorted by ID/DATE, and (2) you want both the R record and the S record for all occasions when a given ID has an R immediately followed by an S:
data have;
input ID date :date9. type :$1. ;
format date date9. ;
datalines;
99997887 05-May-20 R
99997887 04-Jun-20 R
99998960 02-Jun-20 S
99998960 03-Jul-20 S
99998960 03-Aug-20 S
100003216 14-Jun-20 R
100003216 15-Jul-20 R
100005355 05-May-20 R
100005355 04-Jun-20 R
100005355 05-Jul-20 R
100005981 03-Jun-20 R
100005981 03-Aug-20 S
100009933 10-May-20 R
100009933 10-Jul-20 R
100009933 09-Aug-20 R
100010577 22-May-20 R
100010577 22-Jul-20 S
run;
data want (drop=nxt_:);
set have (keep=id);
by id;
merge have have (firstobs=2 keep=type rename=(type=nxt_type));
if (last.id=0 and type='R' and nxt_type='S')
or (first.id=0 and type='S' and lag(type)='R');
run;
It worked. Thank you so much. I was going to go the proc sql way by subsetting and then joining again but your method is fast and elegant. Thanks for the explanation. I would just like to compare this to the second solution posted and see if there is a variance in output. Your extract is what i was looking for. Your assumption is bang on.
You can subset two records as required and may be prefer next output format
having both records data in one observation:
data want;
set have;
by ID;
retain lag_type lag_date;
if first.ID then do; lag_type=" "; lag_date=.; end;
else do; lag_type = type; lag_date=date; end;
if not first.ID and type='S' and lag_type='R' then do;
R_type = lag_type; R_date=lag_date;
S_type = type; S_date = date;
output;
keep ID R_: S_: ;
end; /* line added */
run;
Thank you Shmuel,
But the output was blank... no errors. the second do block was not closed so i closed it like this:
Next is a fixed tested code:
data have;
infile datalines dlm='09'x;
input ID date :date9. type :$1. ;
format date date9. ;
datalines;
99997887 05-May-20 R
99997887 04-Jun-20 R
99998960 02-Jun-20 S
99998960 03-Jul-20 S
99998960 03-Aug-20 S
100003216 14-Jun-20 R
100003216 15-Jul-20 R
100005355 05-May-20 R
100005355 04-Jun-20 R
100005355 05-Jul-20 R
100005981 03-Jun-20 R
100005981 03-Aug-20 S
100009933 10-May-20 R
100009933 10-Jul-20 R
100009933 09-Aug-20 R
100010577 22-May-20 R
100010577 22-Jul-20 S
run;
data want;
set have;
by ID;
retain lag_type lag_date;
lag_type = lag(type); lag_date=lag(date);
if first.ID then do; lag_type=" "; lag_date=.; end;
if ID in (100005981 , 100010577 ) then
putlog ID= date= type= lag_type= lag_date;
if not first.ID and type='S' and lag_type='R' then do;
R_type = lag_type; R_date=lag_date;
S_type = type; S_date = date;
output;
keep ID R_: S_: ; end; /* line added */
run;
you may also try the below code
data want;
length val $10.;
do until(last.id);
set have;
by id date;
retain val;
if first.id then val=type;
else val=cats(val,type);
end;
do until(last.id);
set have;
by id date;
if index(val,'RS') then output;
end;
run;
Sorry for the delay all. I was on vacation and taking a break from all work. I tested the next 2 suggestions and they both work.
Thanks again 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.