Hi everyone,
I have some prescription redemption data that looks something like this:
ID | ATC | P_DATE | PACKSIZE | DURATION(days) |
1 | AA10 | 03_01_12 | 100 | 30 |
1 | AA10 | 03_01_12 | 100 | 30 |
1 | AA10 | 03_01_12 | 100 | 30 |
2 | AA10 | 14_02_12 | 50 | 30 |
2 | AA10 | 15_04_12 | 50 | 30 |
2 | BB10 | 12_01_12 | 100 | 60 |
2 | BB10 | 15_06_12 | 100 | 60 |
3 | CC10 | 27_08_13 | 50 | 30 |
3 | CC10 | 12_09_13 | 50 | 30 |
Here, data are sorted by ID, ATC, P_DATE (ascending).
I have the duration in days of each prescription, but some individuals redeem multiple prescriptions for the same drug the same day, and I would like to calculate the total duration for those prescriptions.
For example, for the first three rows, I would like to sum the three durations for the prescriptions with the same ID, atc and P_date.
My initial idea was to add up the durations using the LAG function in a way such that for the first three rows (same ID, atc and p_date), the durations are added up in one of the three prescriptions (with a duration of 90 days) and then delete the other two using proc sort and nodupkey later.
This way the prescriptions become a type of "prescription redemption events" such that for each specific ID, date and ATC code, only one prescription with the summed duration exists.
My base code (which unfortunately only partially works - if there are more than 2 identical prescriptiond per date it creates errors and there are other issues):
data new;
set old;
by ID atc p_date;
if p_date=lag(p_date) then do;
duration=duration+lag(duration);
end;
run;
However, I haven't been able to do this. Does anyone have a suggestion? Feel free to suggest different approaches.
Thanks.
The data step solution would be to RETAIN a value and add to it.
data new; set old; by ID atc p_date; retain cumduration; if first.p_date cumduration=duration; else cumduration=cumduration+duration; if last.p_date; run;
Retain means that the variable values are kept across the data step iteration boundary.
The BY creates automatic variables that indicate whether an observation is the first or last of a by group. You access these values using First.variablename and Last.variablename. These are temporary and do not get written to the data set and have values of 1, true, or 0, false for if something is the first or last.
The If Last.p_date is a subsetting if and only observations where the condition is true are kept. So no need for proc sort nodupekey. You could comment out this bit to see the data set get built with the cumulative duration and then uncomment to execute to subset the data.
I add a variable for the cumulative duration so you can see if it works as expected.
The data step solution would be to RETAIN a value and add to it.
data new; set old; by ID atc p_date; retain cumduration; if first.p_date cumduration=duration; else cumduration=cumduration+duration; if last.p_date; run;
Retain means that the variable values are kept across the data step iteration boundary.
The BY creates automatic variables that indicate whether an observation is the first or last of a by group. You access these values using First.variablename and Last.variablename. These are temporary and do not get written to the data set and have values of 1, true, or 0, false for if something is the first or last.
The If Last.p_date is a subsetting if and only observations where the condition is true are kept. So no need for proc sort nodupekey. You could comment out this bit to see the data set get built with the cumulative duration and then uncomment to execute to subset the data.
I add a variable for the cumulative duration so you can see if it works as expected.
Thanks a lot - that works perfectly.
Not sure about your overall logic, but you should definitely fix the miss use of LAG(). If you run LAG() only on some of the observations, then it can only return some of the values (the ones you passed it by running LAG()).
data new;
set old;
by ID atc p_date;
lag_duration = lag(duration);
if p_date=lag(p_date) then do;
/* Code can now use LAG_DURATION variable to get value from previous observation */
end;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.