Obsidian | Level 7

## Summing rows with identical values using the LAG function and keeping one row

Hi everyone,

I have some prescription redemption data that looks something like this:

 ID ATC P_DATE PACKSIZE DURATION(days) 1 AA10 03_01_12 100 30 1 AA10 03_01_12 100 30 1 AA10 03_01_12 100 30 2 AA10 14_02_12 50 30 2 AA10 15_04_12 50 30 2 BB10 12_01_12 100 60 2 BB10 15_06_12 100 60 3 CC10 27_08_13 50 30 3 CC10 12_09_13 50 30

Here, data are sorted by ID, ATC, P_DATE (ascending).

I have the duration in days of each prescription, but some individuals redeem multiple prescriptions for the same drug the same day, and I would like to calculate the total duration for those prescriptions.

For example, for the first three rows, I would like to sum the three durations for the prescriptions with the same ID, atc and P_date.

My initial idea was to add up the durations using the LAG function in a way such that for the first three rows (same ID, atc and p_date), the durations are added up in one of the three prescriptions (with a duration of 90 days) and then delete the other two using proc sort and nodupkey later.

This way the prescriptions become a type of "prescription redemption events" such that for each specific ID, date and ATC code, only one prescription with the summed duration exists.

My base code (which unfortunately only partially works - if there are more than 2 identical prescriptiond per date it creates errors and there are other issues):

``````data new;
set old;
by ID atc p_date;
if p_date=lag(p_date) then do;
duration=duration+lag(duration);
end;
run;``````

However, I haven't been able to do this. Does anyone have a suggestion? Feel free to suggest different approaches.

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Summing rows with identical values using the LAG function and keeping one row

The data step solution would be to RETAIN a value and add to it.

```data new;
set old;
by ID atc p_date;
retain cumduration;
if first.p_date cumduration=duration;
else cumduration=cumduration+duration;
if last.p_date;
run;```

Retain means that the variable values are kept across the data step iteration boundary.

The BY creates automatic variables that indicate whether an observation is the first or last of a by group. You access these values using First.variablename and Last.variablename. These are temporary and do not get written to the data set and have values of 1, true, or 0, false for if something is the first or last.

The If Last.p_date is a subsetting if and only observations where the condition is true are kept. So no need for proc sort nodupekey. You could comment out this bit to see the data set get built with the cumulative duration and then uncomment to execute to subset the data.

I add a variable for the cumulative duration so you can see if it works as expected.

3 REPLIES 3
Super User

## Re: Summing rows with identical values using the LAG function and keeping one row

The data step solution would be to RETAIN a value and add to it.

```data new;
set old;
by ID atc p_date;
retain cumduration;
if first.p_date cumduration=duration;
else cumduration=cumduration+duration;
if last.p_date;
run;```

Retain means that the variable values are kept across the data step iteration boundary.

The BY creates automatic variables that indicate whether an observation is the first or last of a by group. You access these values using First.variablename and Last.variablename. These are temporary and do not get written to the data set and have values of 1, true, or 0, false for if something is the first or last.

The If Last.p_date is a subsetting if and only observations where the condition is true are kept. So no need for proc sort nodupekey. You could comment out this bit to see the data set get built with the cumulative duration and then uncomment to execute to subset the data.

I add a variable for the cumulative duration so you can see if it works as expected.

Obsidian | Level 7

## Re: Summing rows with identical values using the LAG function and keeping one row

Thanks a lot - that works perfectly.

Super User

## Re: Summing rows with identical values using the LAG function and keeping one row

Not sure about your overall logic, but you should definitely fix the miss use of LAG().  If you run LAG() only on some of the observations, then it can only return some of the values (the ones you passed it by running LAG()).

``````data new;
set old;
by ID atc p_date;
lag_duration = lag(duration);
if p_date=lag(p_date) then do;
/* Code can now use LAG_DURATION variable to get value from previous observation */
end;
run;``````
Discussion stats
• 3 replies
• 420 views
• 0 likes
• 3 in conversation