I'm trying to analyze some clinical data where every patient has different number of observations based on the number of times they attended hospital for treatment therapy. A few patients were given one type of drug as first_line treatment. Some were given a combination of two different treatment therapies and therefore have two separate rows for the same treatment_date.
The data looks something like this:
id | trt_date | drug |
50 | 1/24/2010 | A |
50 | 1/24/2010 | B |
50 | 1/27/2010 | A |
50 | 1/27/2010 | B |
55 | 1/23/2010 | B |
55 | 1/24/2010 | A |
55 | 1/26/2010 | A |
55 | 1/26/2010 | B |
55 | 1/27/2010 | A |
55 | 1/27/2010 | B |
62 | 2/21/2010 | A |
62 | 2/21/2010 | B |
62 | 2/21/2010 | C |
70 | 1/21/2010 | B |
70 | 1/31/2010 | B |
70 | 2/10/2010 | B |
70 | 9/28/2012 | B |
70 | 10/8/2012 | B |
70 | 10/18/2012 | B |
I want an output dataset which only has the first treatment date/dates and would look like this:
id | trt_date | drug |
50 | 1/24/2010 | A |
50 | 1/24/2010 | B |
55 | 1/23/2010 | B |
62 | 2/21/2010 | A |
62 | 2/21/2010 | B |
62 | 2/21/2010 | C |
70 | 1/21/2010 | B |
I tried to use the retain function with the first. logic but that is only giving me the first treatment date and ignores the other observation for the same treatment date.
Any code suggestions?
I think the simplest is using PROC SQL like this
data have;
input id $ trt_date:mmddyy10. drug $;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A
50 1/24/2010 B
50 1/27/2010 A
50 1/27/2010 B
55 1/23/2010 B
55 1/24/2010 A
55 1/26/2010 A
55 1/26/2010 B
55 1/27/2010 A
55 1/27/2010 B
62 2/21/2010 A
62 2/21/2010 B
70 1/21/2010 B
70 1/31/2010 B
70 2/10/2010 B
70 9/28/2012 B
70 10/8/2012 B
70 10/18/2012 B
;
proc sql;
create table want as
select * from have
group by id
having trt_date=min(trt_date)
order by id, trt_date, drug;
quit;
I think the simplest is using PROC SQL like this
data have;
input id $ trt_date:mmddyy10. drug $;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A
50 1/24/2010 B
50 1/27/2010 A
50 1/27/2010 B
55 1/23/2010 B
55 1/24/2010 A
55 1/26/2010 A
55 1/26/2010 B
55 1/27/2010 A
55 1/27/2010 B
62 2/21/2010 A
62 2/21/2010 B
70 1/21/2010 B
70 1/31/2010 B
70 2/10/2010 B
70 9/28/2012 B
70 10/8/2012 B
70 10/18/2012 B
;
proc sql;
create table want as
select * from have
group by id
having trt_date=min(trt_date)
order by id, trt_date, drug;
quit;
Thanks draycut.
Do you think if there's a way to allot to the VISIT number to the dataset such that the data looks like this:
id | trt_date | drug | visit |
50 | 1/24/2010 | A | 1 |
50 | 1/24/2010 | B | 1 |
50 | 1/27/2010 | A | 2 |
50 | 1/27/2010 | B | 2 |
55 | 1/23/2010 | B | 1 |
55 | 1/24/2010 | A | 2 |
55 | 1/26/2010 | A | 3 |
55 | 1/26/2010 | B | 3 |
55 | 1/27/2010 | A | 4 |
55 | 1/27/2010 | B | 4 |
62 | 2/21/2010 | A | 1 |
62 | 2/21/2010 | B | 1 |
62 | 2/21/2010 | C | 1 |
70 | 1/21/2010 | B | 1 |
70 | 1/31/2010 | B | 2 |
70 | 2/10/2010 | B | 3 |
70 | 9/28/2012 | B | 4 |
70 | 10/8/2012 | B | 5 |
70 | 10/18/2012 | B | 6 |
I used the following code and got consecutive visit date for the same treatment date.
DATA TREATMENT;
SET TREATMENT;
BY ID;
IF FIRST.ID THEN VISIT = 1;
ELSE VISIT + 1;
RUN;
Not easy, nor efficient in sql. datastep is by far superior if your dataset is sorted
My approach still works of you add the visitnumber variable to the data (run the below code)
However, you do not specify if you want the same rows subsetted as before?
data have;
input id $ trt_date:mmddyy10. drug $ visit;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A 1
50 1/24/2010 B 1
50 1/27/2010 A 2
50 1/27/2010 B 2
55 1/23/2010 B 1
55 1/24/2010 A 2
55 1/26/2010 A 3
55 1/26/2010 B 3
55 1/27/2010 A 4
55 1/27/2010 B 4
62 2/21/2010 A 1
62 2/21/2010 B 1
62 2/21/2010 C 1
70 1/21/2010 B 1
70 1/31/2010 B 2
70 2/10/2010 B 3
70 9/28/2012 B 4
70 10/8/2012 B 5
70 10/18/2012 B 6
;
proc sql;
create table want as
select * from have
group by id
having trt_date=min(trt_date)
order by id, trt_date, drug;
quit;
You can add the visit variable with a simple data step like this, and run the SQL code on the TEMP data set
data temp;
set have;
by id trt_date;
retain visit;
if first.id then visit=0;
if first.trt_date then visit+1;
run;
untested as no sas software at home
data want;
set have;
by id date;
retain _date;
if first.id then do;_date=date;output;end;
else if date=_date then output;
run;
@Asquared, did you change your example data? 🙂
Anyway, my solution still works with the edited data:
data have;
input id $ trt_date:mmddyy10. drug $;
format trt_date mmddyy10.;
datalines;
50 1/24/2010 A
50 1/24/2010 B
50 1/27/2010 A
50 1/27/2010 B
55 1/23/2010 B
55 1/24/2010 A
55 1/26/2010 A
55 1/26/2010 B
55 1/27/2010 A
55 1/27/2010 B
62 2/21/2010 A
62 2/21/2010 B
62 2/21/2010 C
70 1/21/2010 B
70 1/31/2010 B
70 2/10/2010 B
70 9/28/2012 B
70 10/8/2012 B
70 10/18/2012 B
;
proc sql;
create table want as
select * from have
group by id
having trt_date=min(trt_date)
order by id, trt_date, drug;
quit;
@PeterClemmensen yeah, I added an observation just to make sure that the code would work in case the patient had more than two drug therapies for the same day. The code you provided worked smoothly. Thanks
Cool, glad it worked 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.