BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TheNovice
Quartz | Level 8

Hi all,

 

The dataset is like below with accounts undergoing treatment R. At some point they will switch to treatment S. Once they switch to S they will not switch back to R. I would like to identify 2 consecutive records where an ID had R and then received S

 

Note: Some id's will never move to S

 

IDdatetype
9999788705-May-20R
9999788704-Jun-20R
9999896002-Jun-20S
9999896003-Jul-20S
9999896003-Aug-20S
10000321614-Jun-20R
10000321615-Jul-20R
10000535505-May-20R
10000535504-Jun-20R
10000535505-Jul-20R
10000598103-Jun-20R
10000598103-Aug-20S
10000993310-May-20R
10000993310-Jul-20R
10000993309-Aug-20R
10001057722-May-20R
10001057722-Jul-20S
1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Next is a fixed tested code:

data have;
infile datalines dlm='09'x;
  input ID	date :date9. type :$1. ;
  format date date9. ;
datalines;
99997887	05-May-20	R
99997887	04-Jun-20	R
99998960	02-Jun-20	S
99998960	03-Jul-20	S
99998960	03-Aug-20	S
100003216	14-Jun-20	R
100003216	15-Jul-20	R
100005355	05-May-20	R
100005355	04-Jun-20	R
100005355	05-Jul-20	R
100005981	03-Jun-20	R
100005981	03-Aug-20	S
100009933	10-May-20	R
100009933	10-Jul-20	R
100009933	09-Aug-20	R
100010577	22-May-20	R
100010577	22-Jul-20	S
run;
data want;
 set have;
   by ID;
      retain lag_type  lag_date;
      lag_type = lag(type); lag_date=lag(date); 
      if first.ID then do; lag_type=" "; lag_date=.; end;

if ID in (100005981	, 100010577	) then 
    putlog  ID= date= type= lag_type= lag_date;

     if not first.ID and type='S' and lag_type='R' then do;
        R_type = lag_type; R_date=lag_date;
        S_type = type;  S_date = date;
        output;
       keep ID R_: S_: ;     end;    /* line added */
run;
         

View solution in original post

9 REPLIES 9
mkeintz
PROC Star

Do you want a report?  Or a dataset?  What do you want it to look like?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TheNovice
Quartz | Level 8

A dataset so i can further analysis.

 

so i would want all id's where there was an event R and then an event S after.

 

I only care about extracting 2 rows per ID this way. R would be my champion and S would be the challenger

mkeintz
PROC Star

Assuming that (1) the data are sorted by ID/DATE, and (2) you want both the R record and the S record for all occasions when a given ID has an R immediately followed by an S:

 

 

data have;
  input ID	date :date9. type :$1. ;
  format date date9. ;
datalines;
99997887	05-May-20	R
99997887	04-Jun-20	R
99998960	02-Jun-20	S
99998960	03-Jul-20	S
99998960	03-Aug-20	S
100003216	14-Jun-20	R
100003216	15-Jul-20	R
100005355	05-May-20	R
100005355	04-Jun-20	R
100005355	05-Jul-20	R
100005981	03-Jun-20	R
100005981	03-Aug-20	S
100009933	10-May-20	R
100009933	10-Jul-20	R
100009933	09-Aug-20	R
100010577	22-May-20	R
100010577	22-Jul-20	S
run;

data want (drop=nxt_:);
  set have (keep=id);
  by id;
  merge have have (firstobs=2 keep=type rename=(type=nxt_type));

  if (last.id=0 and type='R' and nxt_type='S')
  or (first.id=0 and type='S' and lag(type)='R');
run;

 

  1. The SET statement only retrieves the ID variable, because (in combination with the subsequent BY statement) its only purpose is to provide a way to determine whether the record-in-hand is not the start of an ID (first.id=0) or not the end of an ID (last.id=0).
  2. The MERGE retrieves each record and simultaneously one variable (type renamed to nxt_type) from the following record.
  3. Once you know whether the record-in-hand is an R (not at the end of an ID) followed by an S,   or is an S (not at the beginning of an ID) preceded by an R, then keep it.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TheNovice
Quartz | Level 8

It worked. Thank you so much. I was going to go the proc sql way by subsetting and then joining again but your method is fast and elegant. Thanks for the explanation. I would just like to compare this to the second solution posted and see if there is a variance in output. Your extract is what i was looking for. Your assumption is bang on. 

Shmuel
Garnet | Level 18

You can subset two records as required and may be prefer next output format

having both records data in one observation:

data want;
 set have;
   by ID;
      retain lag_type  lag_date;
      if first.ID then do; lag_type=" "; lag_date=.; end;
      else do; lag_type = type; lag_date=date; end;

     if not first.ID and type='S' and lag_type='R' then do;
        R_type = lag_type; R_date=lag_date;
        S_type = type;  S_date = date;
        output;
       keep ID R_: S_: ;
end; /* line added */ run;
TheNovice
Quartz | Level 8

Thank you Shmuel, 

 

But the output was blank... no errors. the second do block was not closed so i closed it like this:

 


data want1;
 set have;
   by ID;
      retain lag_type  lag_date;
      if first.ID then do; lag_type=" "; lag_date=.; end;
      else do; lag_type = type; lag_date=date; end;
     if not first.ID and type='Symend' and lag_type='Regular' then do;
        R_type = lag_type; R_date=lag_date;
        S_type = type;  S_date = date;
        output;
       keep ID R_: S_: ;end;
run;
 
The solution by mkeintz  worked. I would love to run both and see if there is a variance. Would appreciate you looking into it.
 
Thank you again
Shmuel
Garnet | Level 18

Next is a fixed tested code:

data have;
infile datalines dlm='09'x;
  input ID	date :date9. type :$1. ;
  format date date9. ;
datalines;
99997887	05-May-20	R
99997887	04-Jun-20	R
99998960	02-Jun-20	S
99998960	03-Jul-20	S
99998960	03-Aug-20	S
100003216	14-Jun-20	R
100003216	15-Jul-20	R
100005355	05-May-20	R
100005355	04-Jun-20	R
100005355	05-Jul-20	R
100005981	03-Jun-20	R
100005981	03-Aug-20	S
100009933	10-May-20	R
100009933	10-Jul-20	R
100009933	09-Aug-20	R
100010577	22-May-20	R
100010577	22-Jul-20	S
run;
data want;
 set have;
   by ID;
      retain lag_type  lag_date;
      lag_type = lag(type); lag_date=lag(date); 
      if first.ID then do; lag_type=" "; lag_date=.; end;

if ID in (100005981	, 100010577	) then 
    putlog  ID= date= type= lag_type= lag_date;

     if not first.ID and type='S' and lag_type='R' then do;
        R_type = lag_type; R_date=lag_date;
        S_type = type;  S_date = date;
        output;
       keep ID R_: S_: ;     end;    /* line added */
run;
         
Jagadishkatam
Amethyst | Level 16

you may also try the below code

 

data want;
length val $10.;
do until(last.id);
set have;
by id date;
retain val;
if first.id then val=type;
else val=cats(val,type);
end;
do until(last.id);
set have;
by id date;
if index(val,'RS') then output;
end;
run;
Thanks,
Jag
TheNovice
Quartz | Level 8

Sorry for the delay all. I was on vacation and taking a break from all work. I tested the next 2 suggestions and they both work.

 

Thanks again 🙂

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 891 views
  • 2 likes
  • 4 in conversation