Solved
New Contributor
Posts: 2

# Data step - retain - by

Hi All,

The sample data set is found below. In summary, for each unique id i have observed months (1 to 3) and three different cases (A, B, C), the observed measurement is recov. In this exercise, CumRecov is  the wanted measurement which is a cumulative measure for each unique id by case.  I know I could create three dataset by case and estimate the cumrecov with the code found below.  In one data statement, I would like to estimate for each id and case the CumRecov across the observed months.

data dataset2;

set dataset1;

by id;

retain cumrecov;

if first.id  then CumRecov=recov;

else CumRecov=CumRecov+Recov;

run;

 ID Month Case Recov CumRecov 11 1 A 0.1 0.1 11 2 A 0.2 0.3 11 3 A 0.5 0.8 11 1 B 0.12 0.12 11 2 B 0.3 0.42 11 3 B 0.1 0.52 11 1 C 0.3 0.3 11 2 C 0.3 0.6 11 3 C 0.2 0.8

Thanks

Accepted Solutions
Solution
‎10-07-2015 09:26 AM
Super User
Posts: 9,617

## Re: Data step - retain - by

You could actually do this with a retain, and lag() function:

```data have;
input ID	Month	Case \$ Recov	CumRecov;
datalines;
11	1	A	0.1	0.1
11	2	A	0.2	0.3
11	3	A	0.5	0.8
11	1	B	0.12	0.12
11	2	B	0.3	0.42
11	3	B	0.1	0.52
;
run;
data want;
set have;
retain cumrecov;
output;
if lag(case)=case then cumrecov=sum(cumrecov,recov);
else cumrecov=recov;
run;```

All Replies
Posts: 4,743

## Re: Data step - retain - by

...

by id case month;

...

if first.case...

...

Solution
‎10-07-2015 09:26 AM
Super User
Posts: 9,617

## Re: Data step - retain - by

You could actually do this with a retain, and lag() function:

```data have;
input ID	Month	Case \$ Recov	CumRecov;
datalines;
11	1	A	0.1	0.1
11	2	A	0.2	0.3
11	3	A	0.5	0.8
11	1	B	0.12	0.12
11	2	B	0.3	0.42
11	3	B	0.1	0.52
;
run;
data want;
set have;
retain cumrecov;
output;
if lag(case)=case then cumrecov=sum(cumrecov,recov);
else cumrecov=recov;
run;```
Valued Guide
Posts: 765

## Re: Data step - retain - by

[ Edited ]

Hi.  I think that solution only works since you include the variable CUMRECOV in the data set HAVE, yes/no?

I think if you ...

#1  get rid of the OUTPUT statement

#2  use a SUM statement (automatic retention of summed variable so you don't need RETAIN)

it works.

data have;
input id month case :\$1. recov;
datalines;
11 1 A 0.1
11 2 A 0.2
11 3 A 0.5
11 1 B 0.12
11 2 B 0.3
11 3 B 0.1
;

data want;
set have;
if lag(case)=case then cumrecov+recov;
else cumrecov=recov;
run;

data set WANT ...

Obs    id    month    case    recov    cumrecov

1     11      1       A       0.10      0.10
2     11      2       A       0.20      0.30
3     11      3       A       0.50      0.80
4     11      1       B       0.12      0.12
5     11      2       B       0.30      0.42
6     11      3       B       0.10      0.52

With the RETAIN, you could also try IFN ...

data want;
retain cumrecov 0;
set have;
cumrecov = ifn (lag(case)=case, cumrecov+recov, recov);
run;

ps Either is certainly a better (more succinct) idea (your use of  LAG) than my DOW loop.

Super User
Posts: 9,617

## Re: Data step - retain - by

Ha.  Yes you are correct.  It just goes to show that the best way to ask a question is to provide a datastep with the data, and then some required output.  I had just copied over what was given and added my thoughts (which is probably why I added the output in the first place).  Good catch.

Valued Guide
Posts: 765

## Re: Data step - retain - by

[ Edited ]

Hi, here's another idea (I added another ID) ...

data have;
input id month case: \$1. recov @@;
datalines;
11 1 A 0.1  11 2 A 0.2    11 3 A 0.5  11 1 B 0.12
11 2 B 0.3  11 3 B 0.1    12 1 A 2.1  12 2 A 2.2
12 3 A 2.5  12 1 B 2.12   12 2 B 2.3  12 3 B 2.1
;

data want;
do until (last.case);
set have;
by id case;
cumrecov + recov;
output;
end;
cumrecov = 0;
run;

data set WANT (printed BY ID CASE; )  ...

id=11 case=A

Obs    month    recov    cumrecov

1      1       0.1        0.1
2      2       0.2        0.3
3      3       0.5        0.8

id=11 case=B

Obs    month    recov    cumrecov

4      1       0.12      0.12
5      2       0.30      0.42
6      3       0.10      0.52

id=12 case=A

Obs    month    recov    cumrecov

7      1       2.1        2.1
8      2       2.2        4.3
9      3       2.5        6.8

id=12 case=B

Obs    month    recov    cumrecov

10      1       2.12      2.12
11      2       2.30      4.42
12      3       2.10      6.52

Valued Guide
Posts: 765

## Re: Data step - retain - by

[ Edited ]

Hi.  The solution you cited as correct does not work.  It appears to work since the variableCUMRECOV is already part of the data set that is used in the data step with the LAG.  Remove that variable from the data set (it's the one you are calculating so it should not be part of your data) and you'll see that it does not work.

Just look at my comment and the subsequent answer by the person (RW9) who posted the solution.

ps  Here's an even shorter solution (it does work) ...

data want;
set have;
cumrecov + recov - (lag(case) ne case)*cumrecov;
run;

🔒 This topic is solved and locked.