Solved: Re: selecting specific rows by group condition in clinical data

Asquared · Posted 07-29-2018 03:21 PM

I'm trying to analyze some clinical data where every patient has different number of observations based on the number of times they attended hospital for treatment therapy. A few patients were given one type of drug as first_line treatment. Some were given a combination of two different treatment therapies and therefore have two separate rows for the same treatment_date.

The data looks something like this:

id	trt_date	drug
50	1/24/2010	A
50	1/24/2010	B
50	1/27/2010	A
50	1/27/2010	B
55	1/23/2010	B
55	1/24/2010	A
55	1/26/2010	A
55	1/26/2010	B
55	1/27/2010	A
55	1/27/2010	B
62	2/21/2010	A
62	2/21/2010	B
62	2/21/2010	C
70	1/21/2010	B
70	1/31/2010	B
70	2/10/2010	B
70	9/28/2012	B
70	10/8/2012	B
70	10/18/2012	B

I want an output dataset which only has the first treatment date/dates and would look like this:

id	trt_date	drug
50	1/24/2010	A
50	1/24/2010	B
55	1/23/2010	B
62	2/21/2010	A
62	2/21/2010	B
62	2/21/2010	C
70	1/21/2010	B

I tried to use the retain function with the first. logic but that is only giving me the first treatment date and ignores the other observation for the same treatment date.

Any code suggestions?

PeterClemmensen · Posted 07-29-2018 03:34 PM

I think the simplest is using PROC SQL like this

data have;
input id $ trt_date:mmddyy10. drug $;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A
50 1/24/2010 B
50 1/27/2010 A
50 1/27/2010 B
55 1/23/2010 B
55 1/24/2010 A
55 1/26/2010 A
55 1/26/2010 B
55 1/27/2010 A
55 1/27/2010 B
62 2/21/2010 A
62 2/21/2010 B
70 1/21/2010 B
70 1/31/2010 B
70 2/10/2010 B
70 9/28/2012 B
70 10/8/2012 B
70 10/18/2012 B
;

proc sql;
   create table want as
   select * from have
   group by id
   having trt_date=min(trt_date)
   order by id, trt_date, drug;
quit;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 07-29-2018 03:34 PM

I think the simplest is using PROC SQL like this

data have;
input id $ trt_date:mmddyy10. drug $;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A
50 1/24/2010 B
50 1/27/2010 A
50 1/27/2010 B
55 1/23/2010 B
55 1/24/2010 A
55 1/26/2010 A
55 1/26/2010 B
55 1/27/2010 A
55 1/27/2010 B
62 2/21/2010 A
62 2/21/2010 B
70 1/21/2010 B
70 1/31/2010 B
70 2/10/2010 B
70 9/28/2012 B
70 10/8/2012 B
70 10/18/2012 B
;

proc sql;
   create table want as
   select * from have
   group by id
   having trt_date=min(trt_date)
   order by id, trt_date, drug;
quit;

The DATA to DATA Step Macro
Blog: SASnrd

Asquared · Posted 07-29-2018 03:47 PM

Thanks draycut.

Do you think if there's a way to allot to the VISIT number to the dataset such that the data looks like this:

id	trt_date	drug	visit
50	1/24/2010	A	1
50	1/24/2010	B	1
50	1/27/2010	A	2
50	1/27/2010	B	2
55	1/23/2010	B	1
55	1/24/2010	A	2
55	1/26/2010	A	3
55	1/26/2010	B	3
55	1/27/2010	A	4
55	1/27/2010	B	4
62	2/21/2010	A	1
62	2/21/2010	B	1
62	2/21/2010	C	1
70	1/21/2010	B	1
70	1/31/2010	B	2
70	2/10/2010	B	3
70	9/28/2012	B	4
70	10/8/2012	B	5
70	10/18/2012	B	6

I used the following code and got consecutive visit date for the same treatment date.

DATA TREATMENT;
SET TREATMENT;
BY ID;
IF FIRST.ID THEN VISIT = 1;
ELSE VISIT + 1;
RUN;

novinosrin · Posted 07-29-2018 03:52 PM

Not easy, nor efficient in sql. datastep is by far superior if your dataset is sorted

PeterClemmensen · Posted 07-29-2018 03:53 PM

My approach still works of you add the visitnumber variable to the data (run the below code)

However, you do not specify if you want the same rows subsetted as before?

data have;
input id $ trt_date:mmddyy10. drug $ visit;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A 1
50 1/24/2010 B 1
50 1/27/2010 A 2
50 1/27/2010 B 2
55 1/23/2010 B 1
55 1/24/2010 A 2
55 1/26/2010 A 3
55 1/26/2010 B 3
55 1/27/2010 A 4
55 1/27/2010 B 4
62 2/21/2010 A 1
62 2/21/2010 B 1
62 2/21/2010 C 1
70 1/21/2010 B 1
70 1/31/2010 B 2
70 2/10/2010 B 3
70 9/28/2012 B 4
70 10/8/2012 B 5
70 10/18/2012 B 6
;


proc sql;
   create table want as
   select * from have
   group by id
   having trt_date=min(trt_date)
   order by id, trt_date, drug;
quit;

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 07-29-2018 03:59 PM

You can add the visit variable with a simple data step like this, and run the SQL code on the TEMP data set

data temp;
   set have;
   by id trt_date;
   retain visit;
   if first.id then visit=0;
   if first.trt_date then visit+1;
run;

The DATA to DATA Step Macro
Blog: SASnrd

novinosrin · Posted 07-29-2018 03:39 PM

untested as no sas software at home

data want;

set have;

by id date;

retain _date;

if first.id then do;_date=date;output;end;

else if date=_date then output;

run;

PeterClemmensen · Posted 07-29-2018 03:41 PM

@Asquared, did you change your example data? 🙂

Anyway, my solution still works with the edited data:

data have;
input id $ trt_date:mmddyy10. drug $;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A
50 1/24/2010 B
50 1/27/2010 A
50 1/27/2010 B
55 1/23/2010 B
55 1/24/2010 A
55 1/26/2010 A
55 1/26/2010 B
55 1/27/2010 A
55 1/27/2010 B
62 2/21/2010 A
62 2/21/2010 B
62 2/21/2010 C
70 1/21/2010 B
70 1/31/2010 B
70 2/10/2010 B
70 9/28/2012 B
70 10/8/2012 B
70 10/18/2012 B
;

proc sql;
   create table want as
   select * from have
   group by id
   having trt_date=min(trt_date)
   order by id, trt_date, drug;
quit;

The DATA to DATA Step Macro
Blog: SASnrd

Asquared · Posted 07-29-2018 03:56 PM

@PeterClemmensen yeah, I added an observation just to make sure that the code would work in case the patient had more than two drug therapies for the same day. The code you provided worked smoothly. Thanks

PeterClemmensen · Posted 07-29-2018 04:00 PM

Cool, glad it worked 🙂

The DATA to DATA Step Macro
Blog: SASnrd

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away