BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ha33
Obsidian | Level 7

Hi everyone,

 

I have some prescription redemption data that looks something like this: 

ID ATC P_DATE PACKSIZE DURATION(days)
1 AA10 03_01_12 100 30
1 AA10 03_01_12 100 30
1 AA10 03_01_12 100  30
2 AA10 14_02_12 50 30
2 AA10 15_04_12 50 30
2 BB10 12_01_12 100 60
2 BB10 15_06_12 100 60
3 CC10 27_08_13 50 30
3 CC10 12_09_13 50 30

 

Here, data are sorted by ID, ATC, P_DATE (ascending). 

 

I have the duration in days of each prescription, but some individuals redeem multiple prescriptions for the same drug the same day, and I would like to calculate the total duration for those prescriptions.

 

For example, for the first three rows, I would like to sum the three durations for the prescriptions with the same ID, atc and P_date. 

 

My initial idea was to add up the durations using the LAG function in a way such that for the first three rows (same ID, atc and p_date), the durations are added up in one of the three prescriptions (with a duration of 90 days) and then delete the other two using proc sort and nodupkey later. 

 

This way the prescriptions become a type of "prescription redemption events" such that for each specific ID, date and ATC code, only one prescription with the summed duration exists. 

 

My base code (which unfortunately only partially works - if there are more than 2 identical prescriptiond per date it creates errors and there are other issues): 

data new;
 set old;
by ID atc p_date;
if p_date=lag(p_date) then do;
duration=duration+lag(duration);
end;
run;

However, I haven't been able to do this. Does anyone have a suggestion? Feel free to suggest different approaches. 

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The data step solution would be to RETAIN a value and add to it.

data new;
  set old;
  by ID atc p_date;
  retain cumduration;
  if first.p_date cumduration=duration;
  else cumduration=cumduration+duration;
  if last.p_date;
run;

Retain means that the variable values are kept across the data step iteration boundary.

The BY creates automatic variables that indicate whether an observation is the first or last of a by group. You access these values using First.variablename and Last.variablename. These are temporary and do not get written to the data set and have values of 1, true, or 0, false for if something is the first or last.

The If Last.p_date is a subsetting if and only observations where the condition is true are kept. So no need for proc sort nodupekey. You could comment out this bit to see the data set get built with the cumulative duration and then uncomment to execute to subset the data.

I add a variable for the cumulative duration so you can see if it works as expected.

 

 

View solution in original post

3 REPLIES 3
ballardw
Super User

The data step solution would be to RETAIN a value and add to it.

data new;
  set old;
  by ID atc p_date;
  retain cumduration;
  if first.p_date cumduration=duration;
  else cumduration=cumduration+duration;
  if last.p_date;
run;

Retain means that the variable values are kept across the data step iteration boundary.

The BY creates automatic variables that indicate whether an observation is the first or last of a by group. You access these values using First.variablename and Last.variablename. These are temporary and do not get written to the data set and have values of 1, true, or 0, false for if something is the first or last.

The If Last.p_date is a subsetting if and only observations where the condition is true are kept. So no need for proc sort nodupekey. You could comment out this bit to see the data set get built with the cumulative duration and then uncomment to execute to subset the data.

I add a variable for the cumulative duration so you can see if it works as expected.

 

 

ha33
Obsidian | Level 7

Thanks a lot - that works perfectly.

Tom
Super User Tom
Super User

Not sure about your overall logic, but you should definitely fix the miss use of LAG().  If you run LAG() only on some of the observations, then it can only return some of the values (the ones you passed it by running LAG()).

data new;
  set old;
  by ID atc p_date;
  lag_duration = lag(duration);
  if p_date=lag(p_date) then do;
 /* Code can now use LAG_DURATION variable to get value from previous observation */
  end;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 537 views
  • 0 likes
  • 3 in conversation