Hi all,
The dataset is like below with accounts undergoing treatment R. At some point they will switch to treatment S. Once they switch to S they will not switch back to R. I would like to identify 2 consecutive records where an ID had R and then received S
Note: Some id's will never move to S
| ID | date | type |
| 99997887 | 05-May-20 | R |
| 99997887 | 04-Jun-20 | R |
| 99998960 | 02-Jun-20 | S |
| 99998960 | 03-Jul-20 | S |
| 99998960 | 03-Aug-20 | S |
| 100003216 | 14-Jun-20 | R |
| 100003216 | 15-Jul-20 | R |
| 100005355 | 05-May-20 | R |
| 100005355 | 04-Jun-20 | R |
| 100005355 | 05-Jul-20 | R |
| 100005981 | 03-Jun-20 | R |
| 100005981 | 03-Aug-20 | S |
| 100009933 | 10-May-20 | R |
| 100009933 | 10-Jul-20 | R |
| 100009933 | 09-Aug-20 | R |
| 100010577 | 22-May-20 | R |
| 100010577 | 22-Jul-20 | S |
Next is a fixed tested code:
data have;
infile datalines dlm='09'x;
input ID date :date9. type :$1. ;
format date date9. ;
datalines;
99997887 05-May-20 R
99997887 04-Jun-20 R
99998960 02-Jun-20 S
99998960 03-Jul-20 S
99998960 03-Aug-20 S
100003216 14-Jun-20 R
100003216 15-Jul-20 R
100005355 05-May-20 R
100005355 04-Jun-20 R
100005355 05-Jul-20 R
100005981 03-Jun-20 R
100005981 03-Aug-20 S
100009933 10-May-20 R
100009933 10-Jul-20 R
100009933 09-Aug-20 R
100010577 22-May-20 R
100010577 22-Jul-20 S
run;
data want;
set have;
by ID;
retain lag_type lag_date;
lag_type = lag(type); lag_date=lag(date);
if first.ID then do; lag_type=" "; lag_date=.; end;
if ID in (100005981 , 100010577 ) then
putlog ID= date= type= lag_type= lag_date;
if not first.ID and type='S' and lag_type='R' then do;
R_type = lag_type; R_date=lag_date;
S_type = type; S_date = date;
output;
keep ID R_: S_: ; end; /* line added */
run;
Do you want a report? Or a dataset? What do you want it to look like?
A dataset so i can further analysis.
so i would want all id's where there was an event R and then an event S after.
I only care about extracting 2 rows per ID this way. R would be my champion and S would be the challenger
Assuming that (1) the data are sorted by ID/DATE, and (2) you want both the R record and the S record for all occasions when a given ID has an R immediately followed by an S:
data have;
input ID date :date9. type :$1. ;
format date date9. ;
datalines;
99997887 05-May-20 R
99997887 04-Jun-20 R
99998960 02-Jun-20 S
99998960 03-Jul-20 S
99998960 03-Aug-20 S
100003216 14-Jun-20 R
100003216 15-Jul-20 R
100005355 05-May-20 R
100005355 04-Jun-20 R
100005355 05-Jul-20 R
100005981 03-Jun-20 R
100005981 03-Aug-20 S
100009933 10-May-20 R
100009933 10-Jul-20 R
100009933 09-Aug-20 R
100010577 22-May-20 R
100010577 22-Jul-20 S
run;
data want (drop=nxt_:);
set have (keep=id);
by id;
merge have have (firstobs=2 keep=type rename=(type=nxt_type));
if (last.id=0 and type='R' and nxt_type='S')
or (first.id=0 and type='S' and lag(type)='R');
run;
It worked. Thank you so much. I was going to go the proc sql way by subsetting and then joining again but your method is fast and elegant. Thanks for the explanation. I would just like to compare this to the second solution posted and see if there is a variance in output. Your extract is what i was looking for. Your assumption is bang on.
You can subset two records as required and may be prefer next output format
having both records data in one observation:
data want;
set have;
by ID;
retain lag_type lag_date;
if first.ID then do; lag_type=" "; lag_date=.; end;
else do; lag_type = type; lag_date=date; end;
if not first.ID and type='S' and lag_type='R' then do;
R_type = lag_type; R_date=lag_date;
S_type = type; S_date = date;
output;
keep ID R_: S_: ;
end; /* line added */
run;
Thank you Shmuel,
But the output was blank... no errors. the second do block was not closed so i closed it like this:
Next is a fixed tested code:
data have;
infile datalines dlm='09'x;
input ID date :date9. type :$1. ;
format date date9. ;
datalines;
99997887 05-May-20 R
99997887 04-Jun-20 R
99998960 02-Jun-20 S
99998960 03-Jul-20 S
99998960 03-Aug-20 S
100003216 14-Jun-20 R
100003216 15-Jul-20 R
100005355 05-May-20 R
100005355 04-Jun-20 R
100005355 05-Jul-20 R
100005981 03-Jun-20 R
100005981 03-Aug-20 S
100009933 10-May-20 R
100009933 10-Jul-20 R
100009933 09-Aug-20 R
100010577 22-May-20 R
100010577 22-Jul-20 S
run;
data want;
set have;
by ID;
retain lag_type lag_date;
lag_type = lag(type); lag_date=lag(date);
if first.ID then do; lag_type=" "; lag_date=.; end;
if ID in (100005981 , 100010577 ) then
putlog ID= date= type= lag_type= lag_date;
if not first.ID and type='S' and lag_type='R' then do;
R_type = lag_type; R_date=lag_date;
S_type = type; S_date = date;
output;
keep ID R_: S_: ; end; /* line added */
run;
you may also try the below code
data want;
length val $10.;
do until(last.id);
set have;
by id date;
retain val;
if first.id then val=type;
else val=cats(val,type);
end;
do until(last.id);
set have;
by id date;
if index(val,'RS') then output;
end;
run;
Sorry for the delay all. I was on vacation and taking a break from all work. I tested the next 2 suggestions and they both work.
Thanks again 🙂
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.