Hi, I am working on the project and stuck at this point.
I have a data with id, start date, end date for prescribed drug, drug as below.
This is the list of treatment days for certain drug for each patient.
I need to keep the sequence of prescribed drug.
The data 'have' is like
ID start_date end_date drug
~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 2003-01-01 2003-01-28 A
1 2003-02-01 2003-03-01 A
1 2003-03-15 2003-03-31 A
1 2003-04-11 2003-04-30 B
1 2003-06-01 2003-07-28 B
1 2003-07-31 2003-08-15 B
1 2003-09-01 2003-09-28 A
1 2003-10-01 2003-11-28 A
2 2005-04-30 2005-06-01 C
........
.....
...
if I use the code like
proc sql;
create table want as
select id, min(start_date) as start, max(end_date) as end, drug
from have
group by drug;
quit;
I would have a data like
id start_date end_date drug
~~~~~~~~~~~~~~~~~~~~~~~~~
1 2003-01-01 2003-11-28 A
1 2003-04-11 2003-08-15 B
this is wrong.
The output I want is
id start_date end_date drug
~~~~~~~~~~~~~~~~~~~~~~~~~
1 2003-01-01 2003-03-31 A
1 2003-04-11 2003-08-15 B
1 2003-09-01 2003-11-28 A
........
.....
I need to keep the sequence of drug and get the min. date and max. date for each sequence of drug.
anybody know what to do? I tried first.last. retain..... etc..
i am really stuck here.. please help me out!!
data have;
input ID (start_date end_date) (:yymmdd10.) drug $;
format start_date end_date yymmdd10.;
cards;
1 2003-01-01 2003-01-28 A
1 2003-02-01 2003-03-01 A
1 2003-03-15 2003-03-31 A
1 2003-04-11 2003-04-30 B
1 2003-06-01 2003-07-28 B
1 2003-07-31 2003-08-15 B
1 2003-09-01 2003-09-28 A
1 2003-10-01 2003-11-28 A
;
data want;
if 0 then set have;
call missing(start,end);
do until(last.drug);
set have;
by id drug notsorted;
start=min(start,start_date);
end=max(end,end_date);
end;
format start end yymmdd10.;
keep id start end drug;
run;
You need to create a sequence variable somehow, assuming your input data is sorted properly.
UNTESTED CODE
data have2;
set have;
previous_id=lag(id);
previous_drug=lag(drug);
if id=previous_id and drug^=previous_drug then sequence+1;
run;
proc summary data=have2;
class id drug sequence;
var start_date end_date;
output out=want min(start_date)= max(end_date)=;
run;
Try adding the NOTSORTED options to a BY statement within PROC SUMMARY or MEANS.
proc summary data=have2;
by id drug notsorted;
var start_date end_date;
output out=want min(start_date)= max(end_date)=;
run;
data have2;
set have;
by id;
previous_drug=lag(drug);
if first.id then previous_drug='';
if drug^=previous_drug then sequence+1;
if first.id then sequence=1;
run;
proc summary data=have2;
by id drug notsorted;
var start_date end_date ;
output out=want min(start_date)= max(end_date)= ;
run;
I use the code above.. It works perfectly as I wanted!
I really appreciated your help!!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.