BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Asquared
Calcite | Level 5

I'm trying to analyze some clinical data where every patient has different number of observations based on the number of times they attended hospital for treatment therapy. A few patients were given one type of drug as first_line treatment. Some were given a combination of two different treatment therapies and therefore have two separate rows for the same treatment_date.

The data looks something like this:


  

idtrt_datedrug
501/24/2010A
501/24/2010B
501/27/2010A
501/27/2010B
551/23/2010B
551/24/2010A
551/26/2010A
551/26/2010B
551/27/2010A
551/27/2010B
622/21/2010A
622/21/2010B
622/21/2010C
701/21/2010B
701/31/2010B
702/10/2010B
709/28/2012B
7010/8/2012B
7010/18/2012B

 

I want an output dataset which only has the first treatment date/dates and would look like this:

 

idtrt_datedrug
501/24/2010A
501/24/2010B
551/23/2010B
622/21/2010A
622/21/2010B
622/21/2010C
701/21/2010B


 I tried to use the retain function with the first. logic but that is only giving me the first treatment date and ignores the other observation for the same treatment date. 


Any code suggestions?

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

I think the simplest is using PROC SQL like this

 

data have;
input id $ trt_date:mmddyy10. drug $;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A
50 1/24/2010 B
50 1/27/2010 A
50 1/27/2010 B
55 1/23/2010 B
55 1/24/2010 A
55 1/26/2010 A
55 1/26/2010 B
55 1/27/2010 A
55 1/27/2010 B
62 2/21/2010 A
62 2/21/2010 B
70 1/21/2010 B
70 1/31/2010 B
70 2/10/2010 B
70 9/28/2012 B
70 10/8/2012 B
70 10/18/2012 B
;

proc sql;
   create table want as
   select * from have
   group by id
   having trt_date=min(trt_date)
   order by id, trt_date, drug;
quit;

View solution in original post

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

I think the simplest is using PROC SQL like this

 

data have;
input id $ trt_date:mmddyy10. drug $;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A
50 1/24/2010 B
50 1/27/2010 A
50 1/27/2010 B
55 1/23/2010 B
55 1/24/2010 A
55 1/26/2010 A
55 1/26/2010 B
55 1/27/2010 A
55 1/27/2010 B
62 2/21/2010 A
62 2/21/2010 B
70 1/21/2010 B
70 1/31/2010 B
70 2/10/2010 B
70 9/28/2012 B
70 10/8/2012 B
70 10/18/2012 B
;

proc sql;
   create table want as
   select * from have
   group by id
   having trt_date=min(trt_date)
   order by id, trt_date, drug;
quit;
Asquared
Calcite | Level 5

Thanks draycut.

Do you think if there's a way to allot to the VISIT number to the dataset such that the data looks like this:

 

idtrt_datedrugvisit
501/24/2010A1
501/24/2010B1
501/27/2010A2
501/27/2010B2
551/23/2010B1
551/24/2010A2
551/26/2010A3
551/26/2010B3
551/27/2010A4
551/27/2010B4
622/21/2010A1
622/21/2010B1
622/21/2010C1
701/21/2010B1
701/31/2010B2
702/10/2010B3
709/28/2012B4
7010/8/2012B5
7010/18/2012B6

 

I used the following code and got consecutive visit date for the same treatment date. 

 

DATA TREATMENT;
SET TREATMENT;
BY ID;
IF FIRST.ID THEN VISIT = 1;
ELSE VISIT + 1;
RUN;

novinosrin
Tourmaline | Level 20

Not easy, nor efficient in sql. datastep is by far superior if your dataset is sorted

PeterClemmensen
Tourmaline | Level 20

My approach still works of you add the visitnumber variable to the data (run the below code)

 

However, you do not specify if you want the same rows subsetted as before?

 

data have;
input id $ trt_date:mmddyy10. drug $ visit;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A 1
50 1/24/2010 B 1
50 1/27/2010 A 2
50 1/27/2010 B 2
55 1/23/2010 B 1
55 1/24/2010 A 2
55 1/26/2010 A 3
55 1/26/2010 B 3
55 1/27/2010 A 4
55 1/27/2010 B 4
62 2/21/2010 A 1
62 2/21/2010 B 1
62 2/21/2010 C 1
70 1/21/2010 B 1
70 1/31/2010 B 2
70 2/10/2010 B 3
70 9/28/2012 B 4
70 10/8/2012 B 5
70 10/18/2012 B 6
;


proc sql;
   create table want as
   select * from have
   group by id
   having trt_date=min(trt_date)
   order by id, trt_date, drug;
quit;
PeterClemmensen
Tourmaline | Level 20

You can add the visit variable with a simple data step like this, and run the SQL code on the TEMP data set

 

data temp;
   set have;
   by id trt_date;
   retain visit;
   if first.id then visit=0;
   if first.trt_date then visit+1;
run;
novinosrin
Tourmaline | Level 20

untested as no sas software at home

 

data want;

set have;

by id date;

retain _date;

if first.id then do;_date=date;output;end;

else if date=_date then output;

run;

PeterClemmensen
Tourmaline | Level 20

@Asquared, did you change your example data? 🙂

 

Anyway, my solution still works with the edited data:

 

data have;
input id $ trt_date:mmddyy10. drug $;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A
50 1/24/2010 B
50 1/27/2010 A
50 1/27/2010 B
55 1/23/2010 B
55 1/24/2010 A
55 1/26/2010 A
55 1/26/2010 B
55 1/27/2010 A
55 1/27/2010 B
62 2/21/2010 A
62 2/21/2010 B
62 2/21/2010 C
70 1/21/2010 B
70 1/31/2010 B
70 2/10/2010 B
70 9/28/2012 B
70 10/8/2012 B
70 10/18/2012 B
;

proc sql;
   create table want as
   select * from have
   group by id
   having trt_date=min(trt_date)
   order by id, trt_date, drug;
quit;
Asquared
Calcite | Level 5

@PeterClemmensen yeah, I added an observation just to make sure that the code would work in case the patient had more than two drug therapies for the same day. The code you provided worked smoothly. Thanks

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1262 views
  • 0 likes
  • 3 in conversation