BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alireza_Boloori
Fluorite | Level 6

Hello everyone,

 

I have a data like this:

 

patient_ID    drug_dose    date_start         date_stop

1                  10                 04/26/2008       11/10/2008

1                  5                   06/16/2008       02/02/2009

1                  7                   10/13/2008       11/12/2008

2                  3                   04/30/2008       10/27/2008

2                  11                 08/17/2008       05/14/2009

...

where all start/stop dates have closed intervals. Then, I want to have a data like this:

 

patient_ID    drug_dose    date_start         date_stop

1                  10                 04/26/2008       06/15/2008

1                  15                 06/16/2008       10/12/2008

1                  22                 10/13/2008       11/10/2008

1                  12                 11/11/2008       11/12/2008

1                   5                  11/13/2008       02/02/2009

2                   3                   04/30/2008      08/16/2008

2                  14                  08/17/2008      10/27/2008

2                  11                  10/28/2008      05/14/2009

...

 

In other words, I want to obtain the whole drug dosage for every time interval for each patient.

 

I'd be very thankful if anyone has any idea/suggestion!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

OK then, the data set is large, but 3 years is not that many days. So here goes an array solution:

 

data have;
input patient_ID drug_dose (date_start date_stop) (:mmddyy10.);
format date_start date_stop yymmdd10.;
datalines;
1 10 04/26/2008 11/10/2008
1 5 06/16/2008 02/02/2009
1 7 10/13/2008 11/12/2008
2 3 04/30/2008 10/27/2008
2 11 08/17/2008 05/14/2009
;

data want;
/* Array subscripts should start before earliest date_start and 
   end after latest date_stop */
%let dstart=%sysfunc(mdy(12,31,2007));
%let dend=%sysfunc(mdy(1,1,2010));
array d{&dstart. : &dend.};
do until(last.patient_id);
    set have; by patient_id;
    do date = date_start to date_stop;
        d{date} = sum(d{date}, drug_dose);
        end;
    end;
do date = &dstart.+1 to &dend.;
    if d{date} ne d{date-1} then do;
        if not missing(date_start) then do;
            date_stop = date-1;
            drug_dose = d{date-1};
            if drug_dose > 0 then output;
            call missing(date_start, drug_dose);
            end;
        date_start = date;
        end;
    end;
format date yymmdd10.;
keep patient_id date_start date_stop drug_dose;
run;
PG

View solution in original post

12 REPLIES 12
PGStats
Opal | Level 21

If your data isn't too huge, you can simply expand and sum, like this:

 

data have;
input patient_ID drug_dose (date_start date_stop) (:mmddyy10.);
format date_start date_stop yymmdd10.;
datalines;
1 10 04/26/2008 11/10/2008
1 5 06/16/2008 02/02/2009
1 7 10/13/2008 11/12/2008
2 3 04/30/2008 10/27/2008
2 11 08/17/2008 05/14/2009
;

data expand;
set have;
do date = date_start to date_stop;
    output;
    end;
format date yymmdd10.;
keep patient_id date drug_dose;
run;

proc sql;
create table sumDoses as
select patient_id, date, sum(drug_dose) as drug_dose
from expand
group by patient_id, date
order by patient_id, date;
quit;

data want;
set sumDoses;
by patient_id drug_dose notsorted;
retain date_start;
if first.drug_dose then date_start =  date;
if last.drug_dose and drug_dose > 0 then do;
    date_stop = date;
    output;
    end;
format date_start date_stop yymmdd10.;
keep patient_id drug_dose date_start date_stop;
run;

Otherwise, if your time period isn't too long, you could use an array (one element per day).

 

PG
Alireza_Boloori
Fluorite | Level 6

@PGStats Thank you very much! However, based on your comments, my data has 2 problems: I have >= 40 million rows, and the time period is 3 years. Any recommendation?

PGStats
Opal | Level 21

OK then, the data set is large, but 3 years is not that many days. So here goes an array solution:

 

data have;
input patient_ID drug_dose (date_start date_stop) (:mmddyy10.);
format date_start date_stop yymmdd10.;
datalines;
1 10 04/26/2008 11/10/2008
1 5 06/16/2008 02/02/2009
1 7 10/13/2008 11/12/2008
2 3 04/30/2008 10/27/2008
2 11 08/17/2008 05/14/2009
;

data want;
/* Array subscripts should start before earliest date_start and 
   end after latest date_stop */
%let dstart=%sysfunc(mdy(12,31,2007));
%let dend=%sysfunc(mdy(1,1,2010));
array d{&dstart. : &dend.};
do until(last.patient_id);
    set have; by patient_id;
    do date = date_start to date_stop;
        d{date} = sum(d{date}, drug_dose);
        end;
    end;
do date = &dstart.+1 to &dend.;
    if d{date} ne d{date-1} then do;
        if not missing(date_start) then do;
            date_stop = date-1;
            drug_dose = d{date-1};
            if drug_dose > 0 then output;
            call missing(date_start, drug_dose);
            end;
        date_start = date;
        end;
    end;
format date yymmdd10.;
keep patient_id date_start date_stop drug_dose;
run;
PG
Alireza_Boloori
Fluorite | Level 6

@PGStats Thank you very much again!
When I run your code for the period from 12/31/2007 to 1/7/2011, it only gave the results for date_start = '01/01/2008' and date_stop = "01/02/2008'.

 

It also gave me this error i nthe log window:

ERROR: Array subscript out of range at line 317 column 23.

--> line 317 was related to this line of code: d{date} = sum(d{date}, MME_day);
d1=. d2=. d3=. d4=. d5=. .....d1099=0 d1100=0 d1101=0 d1102=0 d1103=0 d1104=0

last.ENROLID=1 ENROLID=41102 MME_day=0
date_start=08/10/2010 date_stop=02/06/2011

FIRST.ENROLID=0 date=01/08/2011 _ERROR_=1 _N_=7

PGStats
Opal | Level 21

Given that you have a date_stop of 2011-02-06, I think you meant the array limits to be

 

%let dstart=%sysfunc(mdy(12,31,2007));
%let dend=%sysfunc(mdy(7,1,2011));

 

Make sure these date limits encompass all of your data.

PG
Alireza_Boloori
Fluorite | Level 6

@PGStats Thank you very much! It resolved my problem.

If you don't mind, I have another question: If I have another drug (lets call it drug 2) with different dose (i.e., drug2_dose), how the code above will be changed?

PGStats
Opal | Level 21

How would you represent cases where a start-stop pair applies to one drug and not the other in your data structure?

PG
Alireza_Boloori
Fluorite | Level 6
Well, the start-stop pair applies to both. Basically, I do have two drugs prescribed simultaneously. So, for every start-stop pair, I do have two drugs.
I'm sorry for this as I should have mentioned this first.
PGStats
Opal | Level 21

Extension to two drugs is pretty straitforward. The data could also include zeros for drug1_dose or drug2_dose if only one was given over a certain period.

 

data have;
input patient_ID drug1_dose drug2_dose (date_start date_stop) (:mmddyy10.);
format date_start date_stop yymmdd10.;
datalines;
1 10  1  04/26/2008 11/10/2008
1 5  15  06/16/2008 02/02/2009
1 7  17  10/13/2008 11/12/2008
2 3  13  04/30/2008 10/27/2008
2 11  1  08/17/2008 05/14/2009
;

data want;

/* Array subscripts should start before earliest date_start and 
   end after latest date_stop */
%let dstart=%sysfunc(mdy(12,31,2007));
%let dend=%sysfunc(mdy(7,1,2011));

array d1{&dstart. : &dend.};
array d2{&dstart. : &dend.};
do until(last.patient_id);
    set have; by patient_id;
    do date = date_start to date_stop;
        d1{date} = sum(d1{date}, drug1_dose);
        d2{date} = sum(d2{date}, drug2_dose);
        end;
    end;
do date = &dstart.+1 to &dend.;
    if d1{date} ne d1{date-1} or d2{date} ne d2{date-1} then do;
        if not missing(date_start) then do;
            date_stop = date-1;
            drug1_dose = d1{date-1};
            drug2_dose = d2{date-1};
            if drug1_dose > 0 or drug2_dose > 0 then output;
            call missing(date_start, drug1_dose, drug2_dose);
            end;
        date_start = date;
        end;
    end;
format date yymmdd10.;
keep patient_id date_start date_stop drug1_dose drug2_dose;
run;

 

 

PG
Alireza_Boloori
Fluorite | Level 6
@PGStats I really appreciate your time and help!
Alireza_Boloori
Fluorite | Level 6

@PGStats Regarding the last code, what happens if I have another variable. Let say, this is the data that I have (where I add variable "category"):

data have;
input patient_ID category drug_dose (date_start date_stop) (:mmddyy10.);
format date_start date_stop yymmdd10.;
datalines;
1 1 10 04/26/2008 11/10/2008
1 1 5 06/16/2008 02/02/2009
1 2 7 10/13/2008 11/12/2008
2 1 3 04/30/2008 10/27/2008
2 2 11 08/17/2008 05/14/2009
;

and this the data I want to have

data want;
input patient_ID category drug_dose (date_start date_stop) (:mmddyy10.);
format date_start date_stop yymmdd10.;
datalines;
1 1 10 04/26/2008 06/15/2008
1 1 15 06/16/2008 11/10/2008
1 1 5 11/11/2008 02/02/2009
1 2 7 10/13/2008 11/12/2008
2 1 3 04/30/2008 10/27/2008
2 2 11 08/17/2008 05/14/2009
;

I tried the second option you provided (using sql and expanding the data), but my data is very huge and I run out of space. So, I'd be very thankful if you direct me in this case.

PGStats
Opal | Level 21

If your category takes values 1 and 2 then the last solution I provided (for drug1 and drug2) is simple to adapt and will not consume much memory.

 

You could also process each category separately for each patient by sorting by patientId and category and loop until(last.category).

 

If you run into problems, post the code you have tried and the messages and/or results that you got. 

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3920 views
  • 3 likes
  • 2 in conversation