Solved: Re: Split an observation into two rows and modify them

Alireza_Boloori · Posted 10-26-2018 04:00 PM

Hello everyone,

I have a data of prescription drug supplies. I want to adjust this data based on a time window of 90 days. Here are what I have and what I want:

data have;
input id date_start:yymmdd10. date_stop:yymmdd10. duration increment_start increment_stop;
datalines;
1	1/15/2008	1/30/2008	16	1	1
1	1/31/2008	2/14/2008	15	1	1
1	4/13/2008	5/2/2008	20	1	2
1	5/30/2008	6/29/2008	31	2	2
1	7/7/2008	8/6/2008	31	2	3
2	1/24/2008	2/18/2008	26	1	1
2	2/19/2008	2/19/2008	1	1	1
2	4/21/2008	5/11/2008	21	1	2
2	5/12/2008	5/13/2008	2	2	2
2	7/11/2008	8/3/2008	24	2	3
;

data want;
input id date_start:yymmdd10. date_stop:yymmdd10. duration increment_start increment_stop;
datalines;
1	1/15/2008	1/30/2008	16	1	1
1	1/31/2008	2/14/2008	15	1	1
1	4/13/2008	4/13/2008	1	1	1
1	4/14/2008	5/2/2008	19	2	2
1	5/30/2008	6/29/2008	31	2	2
1	7/7/2008	7/12/2008	6	2	2
1	7/13/2008	8/6/2008	25	3	3
2	1/24/2008	2/18/2008	26	1	1
2	2/19/2008	2/19/2008	1	1	1
2	4/21/2008	4/22/2008	2	1	1
2	4/23/2008	5/11/2008	19	2	2
2	5/12/2008	5/13/2008	2	2	2
2	7/11/2008	7/21/2008	11	2	2
2	7/22/2008	8/3/2008	13	3	3
;

"date_start" and "date_stop" are the beginning and end of a drug supply for each observation.
"duration" is the number of days for the corresponding drug supply; i.e., duration = date_stop - date_start
For each patient, consider "date_start" for the first observation as the baseline time; e.g., 1/15/2008 is the baseline time for patient 1.
"increment_start" and "increment_stop" indicate whether the whole drug supply period for each observation is within an increment of 90 days from the baseline time. If these two variables are not equal, that means a part of the drug supply is not within that 90 days.
So, I have to split that drug supply (time-wise), so that the first part is within the first 90 days, and the second part is within the second 90 days. For example, you can compare row 3 in data have with rows 3-4 in data want.

Any idea or help is much appreciated!

PGStats · Posted 10-27-2018 12:42 AM

This will also take care of durations longer than 90 days

data want;
retain is0dt;
set have; by id;
if first.id then is0dt = date_start;
ds = date_stop;
do while (date_start <= ds);
    inc = 1 + intck("day90", is0dt, date_start, "c");
    date_stop = min(ds, intnx("day", is0dt, inc*90-1));
    output;
    date_start = intnx("day", date_stop, 1);
    end;
drop is0dt ds;
run;

PG

View solution in original post

ballardw · Posted 10-26-2018 04:13 PM

@Alireza_Boloori wrote:

Hello everyone,

For example, for the third observation for ID = 1, the supply of drug is started before day 90, while the end date of supply is after 90 days. So, I had to split that observation into two rows, and also modify each row (e.g., data_start, data_stop, duration, etc.)

In term of the variables shown how do we know that the " supply of drug is started before day 90" and "the end date of supply is after 90 ". I might guess correctly but the problem should include the variable names used and rules involving the variables. I am guessing that the day_counter variables are involved but it is not explicitly clear which is start or end, similar with the time_window variables.

There seems to be an implied manipulation of the dates and duration_day variable but that rule is not very clear.

You likely know a whole lot about what each of these are because you have been working with them but we have little or not context and it helps if we don't have to guess which what goes where.

Alireza_Boloori · Posted 10-26-2018 07:44 PM

Thanks! I updated my descriptions. So, hopefully it'll be clear this time.

VDD · Posted 10-26-2018 09:46 PM

@Alireza_Boloorican you please post sample code that you are working with?

what have you tried?

You have a lot listed above with rules that are still unclear based on data that is not present in your sample data due to patient engagement in fulfilling prescriptions or having those prescriptions renewed.

Please post what you have tried in the form of a datastep, SQL query.

Thanks

Alireza_Boloori · Posted 10-26-2018 10:35 PM

@VDD Thanks. I've made my description simpler and removed unnecessary variables.

Also, please find the code that I use for this splitting. It only creates the duplicated row, but I couldn't find any way to modify the variables across the duplicated rows.

data want;
set have;
output;
if increment_start~= increment_stop then do;
output;
end;
run;

PGStats · Posted 10-27-2018 12:42 AM

This will also take care of durations longer than 90 days

data want;
retain is0dt;
set have; by id;
if first.id then is0dt = date_start;
ds = date_stop;
do while (date_start <= ds);
    inc = 1 + intck("day90", is0dt, date_start, "c");
    date_stop = min(ds, intnx("day", is0dt, inc*90-1));
    output;
    date_start = intnx("day", date_stop, 1);
    end;
drop is0dt ds;
run;

PG

Alireza_Boloori · Posted 10-27-2018 01:55 PM

@PGStats Thank you very much for your help!

novinosrin · Posted 10-27-2018 02:25 PM

Sir @PGStats Elegance at best

SAS Innovate 2025: Save the Date