Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Sum with retain

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-31-2021 10:20 AM
(2811 views)

Hello everyone,

I have a problem which I have already solved more or less but was wondering if there might be a more elegant/efficient/faster solution to it. Hope somebody can help me.

```
DATA have;
format month date9.;
input month date9. id amount;
datalines;
31DEC2020 1 10
31JAN2021 1 0
28FEB2021 1 5
31MAR2021 1 0
31DEC2020 2 0
31JAN2021 2 0
28FEB2021 2 5
31MAR2021 2 0
31DEC2020 3 20
31JAN2021 3 10
28FEB2021 3 0
31MAR2021 3 0
;
```

This is the table I have and I want to have a table which gives me for each month the overall sum of the amounts.

I tried it with this:

```
DATA want;
SET have;
BY id;
RETAIN sum_all;
sum_all = sum(amount, sum_all);
RUN;
```

But even though I group by the id the cum calculation does not restart at every new id.

I solved it by sorting the have table and the doing the calculation by introducing two IF conditions:

```
proc sort
data=have
out=have_sorted
;
by
id
month
;
run;
DATA want2;
SET have_sorted;
BY id;
RETAIN sum_all;
IF first.id AND amount GT 0 THEN sum_all = 0;
IF first.id AND amount EQ 0 THEN sum_all = amount;
sum_all = sum(amount, sum_all);
RUN;
```

So the table want2 is how I want the end result to look like, for each id and month sum of amount so far.

Thank you for your help.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is the simplest method:

```
data want;
set have;
by id;
if first.id
then sum_all = amount;
else sum_all + amount;
run;
```

The SUM statement (sum_all + amount) implies two things:

- missing values are considered as zero

- the variable to the left is automatically retained

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You say you want the following but your results are not by month, but by ID. Which is the ultimate requirement? Assuming by month and ID:

@aguilar_john wrote:

This is the table I have and I want to have a table which gives me

for each monththe overall sum of the amounts.

Logically your code is almost as succinct as it can be (with the exception of using PROC EXPAND instead).

This is giving you a running total, did you want a running total or just a total per month?

If per month, PROC MEANS will be more efficient and can be used unsorted.

```
proc means data=have N SUM NWAY STACKODS;
class month ID;
format month yymmn6.;
var amount;
output out=want1 sum=total_amount n=Number_Obs;
ods output summary = want2;
run;
```

Your current code can be simplified as follows:

Retain is implicit with the SUM statement. Can your data have missing values? If so, you may want the sum function instead, so keeping the RETAIN + SUM() function is a better overall solution.

```
DATA want2;
SET have_sorted;
BY id;
IF first.id THEN sum_all = 0;
sum_all + amount;
RUN;
```

@aguilar_john wrote:

Hello everyone,

I have a problem which I have already solved more or less but was wondering if there might be a more elegant/efficient/faster solution to it. Hope somebody can help me.

`DATA have; format month date9.; input month date9. id amount; datalines; 31DEC2020 1 10 31JAN2021 1 0 28FEB2021 1 5 31MAR2021 1 0 31DEC2020 2 0 31JAN2021 2 0 28FEB2021 2 5 31MAR2021 2 0 31DEC2020 3 20 31JAN2021 3 10 28FEB2021 3 0 31MAR2021 3 0 ;`

This is the table I have and I want to have a table which gives me for each month the overall sum of the amounts.

I tried it with this:

`DATA want; SET have; BY id; RETAIN sum_all; sum_all = sum(amount, sum_all); RUN;`

But even though I group by the id the cum calculation does not restart at every new id.

I solved it by sorting the have table and the doing the calculation by introducing two IF conditions:

`proc sort data=have out=have_sorted ; by id month ; run; RUN; DATA want2; SET have_sorted; BY id; RETAIN sum_all; IF first.id AND amount GT 0 THEN sum_all = 0; IF first.id AND amount EQ 0 THEN sum_all = amount; sum_all = sum(amount, sum_all); RUN;`

So the table want2 is how I want the end result to look like, for each id and month sum of amount so far.

Thank you for your help.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you @Reeza for your quick respond. I indeed want the overall sum by id up to every month. Meaning for id=1 I want the sum_all=10 for DEC2020, for JAN2021 the amount=0 therefore my sum_all=10, for FEB2021 amount=5 and therefore my desired value is sum_all=15 (10+0+5).

So I always want the running total for each id.

Missing values are ok (I would manipulate the final dataset with a PROC STDIZE) but ideally it does not contain any missings.

Does that mean, that there is no other solution than to first sort the dataset and and use SUM and/or RETAIN?

There is no possibilty to do it in one step?

So I always want the running total for each id.

Missing values are ok (I would manipulate the final dataset with a PROC STDIZE) but ideally it does not contain any missings.

Does that mean, that there is no other solution than to first sort the dataset and and use SUM and/or RETAIN?

There is no possibilty to do it in one step?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You want a sum by ID only over all your months not by months. Use the PROC MEANS and remove the MONTH reference will also give you the totals.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is the simplest method:

```
data want;
set have;
by id;
if first.id
then sum_all = amount;
else sum_all + amount;
run;
```

The SUM statement (sum_all + amount) implies two things:

- missing values are considered as zero

- the variable to the left is automatically retained

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you @Kurt_Bremser and @Reeza for your help!

The solution @Kurt_Bremser presented suits me the most.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.