BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TheNovice
Quartz | Level 8

Hi all,

 

The dataset is like below with accounts undergoing treatment R. At some point they will switch to treatment S. Once they switch to S they will not switch back to R. I would like to identify 2 consecutive records where an ID had R and then received S

 

Note: Some id's will never move to S

 

IDdatetype
9999788705-May-20R
9999788704-Jun-20R
9999896002-Jun-20S
9999896003-Jul-20S
9999896003-Aug-20S
10000321614-Jun-20R
10000321615-Jul-20R
10000535505-May-20R
10000535504-Jun-20R
10000535505-Jul-20R
10000598103-Jun-20R
10000598103-Aug-20S
10000993310-May-20R
10000993310-Jul-20R
10000993309-Aug-20R
10001057722-May-20R
10001057722-Jul-20S
1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Next is a fixed tested code:

data have;
infile datalines dlm='09'x;
  input ID	date :date9. type :$1. ;
  format date date9. ;
datalines;
99997887	05-May-20	R
99997887	04-Jun-20	R
99998960	02-Jun-20	S
99998960	03-Jul-20	S
99998960	03-Aug-20	S
100003216	14-Jun-20	R
100003216	15-Jul-20	R
100005355	05-May-20	R
100005355	04-Jun-20	R
100005355	05-Jul-20	R
100005981	03-Jun-20	R
100005981	03-Aug-20	S
100009933	10-May-20	R
100009933	10-Jul-20	R
100009933	09-Aug-20	R
100010577	22-May-20	R
100010577	22-Jul-20	S
run;
data want;
 set have;
   by ID;
      retain lag_type  lag_date;
      lag_type = lag(type); lag_date=lag(date); 
      if first.ID then do; lag_type=" "; lag_date=.; end;

if ID in (100005981	, 100010577	) then 
    putlog  ID= date= type= lag_type= lag_date;

     if not first.ID and type='S' and lag_type='R' then do;
        R_type = lag_type; R_date=lag_date;
        S_type = type;  S_date = date;
        output;
       keep ID R_: S_: ;     end;    /* line added */
run;
         

View solution in original post

9 REPLIES 9
mkeintz
PROC Star

Do you want a report?  Or a dataset?  What do you want it to look like?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TheNovice
Quartz | Level 8

A dataset so i can further analysis.

 

so i would want all id's where there was an event R and then an event S after.

 

I only care about extracting 2 rows per ID this way. R would be my champion and S would be the challenger

mkeintz
PROC Star

Assuming that (1) the data are sorted by ID/DATE, and (2) you want both the R record and the S record for all occasions when a given ID has an R immediately followed by an S:

 

 

data have;
  input ID	date :date9. type :$1. ;
  format date date9. ;
datalines;
99997887	05-May-20	R
99997887	04-Jun-20	R
99998960	02-Jun-20	S
99998960	03-Jul-20	S
99998960	03-Aug-20	S
100003216	14-Jun-20	R
100003216	15-Jul-20	R
100005355	05-May-20	R
100005355	04-Jun-20	R
100005355	05-Jul-20	R
100005981	03-Jun-20	R
100005981	03-Aug-20	S
100009933	10-May-20	R
100009933	10-Jul-20	R
100009933	09-Aug-20	R
100010577	22-May-20	R
100010577	22-Jul-20	S
run;

data want (drop=nxt_:);
  set have (keep=id);
  by id;
  merge have have (firstobs=2 keep=type rename=(type=nxt_type));

  if (last.id=0 and type='R' and nxt_type='S')
  or (first.id=0 and type='S' and lag(type)='R');
run;

 

  1. The SET statement only retrieves the ID variable, because (in combination with the subsequent BY statement) its only purpose is to provide a way to determine whether the record-in-hand is not the start of an ID (first.id=0) or not the end of an ID (last.id=0).
  2. The MERGE retrieves each record and simultaneously one variable (type renamed to nxt_type) from the following record.
  3. Once you know whether the record-in-hand is an R (not at the end of an ID) followed by an S,   or is an S (not at the beginning of an ID) preceded by an R, then keep it.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TheNovice
Quartz | Level 8

It worked. Thank you so much. I was going to go the proc sql way by subsetting and then joining again but your method is fast and elegant. Thanks for the explanation. I would just like to compare this to the second solution posted and see if there is a variance in output. Your extract is what i was looking for. Your assumption is bang on. 

Shmuel
Garnet | Level 18

You can subset two records as required and may be prefer next output format

having both records data in one observation:

data want;
 set have;
   by ID;
      retain lag_type  lag_date;
      if first.ID then do; lag_type=" "; lag_date=.; end;
      else do; lag_type = type; lag_date=date; end;

     if not first.ID and type='S' and lag_type='R' then do;
        R_type = lag_type; R_date=lag_date;
        S_type = type;  S_date = date;
        output;
       keep ID R_: S_: ;
end; /* line added */ run;
TheNovice
Quartz | Level 8

Thank you Shmuel, 

 

But the output was blank... no errors. the second do block was not closed so i closed it like this:

 


data want1;
 set have;
   by ID;
      retain lag_type  lag_date;
      if first.ID then do; lag_type=" "; lag_date=.; end;
      else do; lag_type = type; lag_date=date; end;
     if not first.ID and type='Symend' and lag_type='Regular' then do;
        R_type = lag_type; R_date=lag_date;
        S_type = type;  S_date = date;
        output;
       keep ID R_: S_: ;end;
run;
 
The solution by mkeintz  worked. I would love to run both and see if there is a variance. Would appreciate you looking into it.
 
Thank you again
Shmuel
Garnet | Level 18

Next is a fixed tested code:

data have;
infile datalines dlm='09'x;
  input ID	date :date9. type :$1. ;
  format date date9. ;
datalines;
99997887	05-May-20	R
99997887	04-Jun-20	R
99998960	02-Jun-20	S
99998960	03-Jul-20	S
99998960	03-Aug-20	S
100003216	14-Jun-20	R
100003216	15-Jul-20	R
100005355	05-May-20	R
100005355	04-Jun-20	R
100005355	05-Jul-20	R
100005981	03-Jun-20	R
100005981	03-Aug-20	S
100009933	10-May-20	R
100009933	10-Jul-20	R
100009933	09-Aug-20	R
100010577	22-May-20	R
100010577	22-Jul-20	S
run;
data want;
 set have;
   by ID;
      retain lag_type  lag_date;
      lag_type = lag(type); lag_date=lag(date); 
      if first.ID then do; lag_type=" "; lag_date=.; end;

if ID in (100005981	, 100010577	) then 
    putlog  ID= date= type= lag_type= lag_date;

     if not first.ID and type='S' and lag_type='R' then do;
        R_type = lag_type; R_date=lag_date;
        S_type = type;  S_date = date;
        output;
       keep ID R_: S_: ;     end;    /* line added */
run;
         
Jagadishkatam
Amethyst | Level 16

you may also try the below code

 

data want;
length val $10.;
do until(last.id);
set have;
by id date;
retain val;
if first.id then val=type;
else val=cats(val,type);
end;
do until(last.id);
set have;
by id date;
if index(val,'RS') then output;
end;
run;
Thanks,
Jag
TheNovice
Quartz | Level 8

Sorry for the delay all. I was on vacation and taking a break from all work. I tested the next 2 suggestions and they both work.

 

Thanks again 🙂

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3141 views
  • 2 likes
  • 4 in conversation