## RETAIN Statement vs. lag() Function

Solved
Regular Contributor
Posts: 247

# RETAIN Statement vs. lag() Function

I have always struggled with the RETAIN statement vs. the lag() function.  I am trying to increment a variable that starts over when at a new patient like this:

SubjectEvent
0011
0012
0013
0021
0022
0031
0032
0033
0034

I have tried a couple of things, and neither of them work:

/* Using the RETAIN statement */

data table2;

set table1;

by subject;

if first.subject then do;

retain event 1;

end;

else event=event+1;

run;

/* Using the lag() function */

data table2;

set table1;

by subject;

if first.subject then event=1;

else event=lag(event)+1;

run;

Can someone, first off, help me figure out how I can get the EVENT variable to fill in correctly?  And second, is there a simple way to explain the difference between RETAIN and lag()?  I have read some SUG literature, but nothing seems to sink in for me.

Accepted Solutions
Solution
‎01-31-2013 02:11 PM
Super User
Posts: 6,641

## Re: RETAIN Statement vs. lag() Function

First, a solution:

if first.subject then event=1;

else event + 1;

Basically, event + 1 has three-fold meaning:  (1) add 1 to event, (2) retain event, and (3) if the value to the right of the plus sign is missing, ignore it and let event keep its current value.  Obviously that doesn't apply here, since "1" will never be missing.  As a side note, this statement gives event a value of 0 at the very beginning of the DATA step, before reading in any data.

LAG is tricky, and does not necessarily retrieve the value from the previous observation.  It retrieves the value from the last time the LAG function executed.  That's easy to see if you have a third variable in your data set with a log of variability.  In that case, try this and see what you get:

if first.subject then new_variable = lag(third_variable);

To force LAG to retrieve the value from the previous observation (which is what you would want most of the time), make sure it executes on every observation ... outside of any IF/THEN statements, before deleting any observations with a subsetting IF.

Good luck.

All Replies
PROC Star
Posts: 8,149

## Re: RETAIN Statement vs. lag() Function

I would use the sum operator which is automatically retained.  e.g.:

data table1;

input Subject \$;

cards;

001

001

001

002

002

003

003

003

003

;

data table2;

set table1;

by subject;

if first.subject then event+1;

run;

Regular Contributor
Posts: 247

## Re: RETAIN Statement vs. lag() Function

When I try that, I get the table to look like this:

Subject     Event     (Should be this)

001          1               1

001          1               2

001          1               3

002          2               1

002          2               2

003          3               1

003          3               2

003          3               3

003          3               4

I did, however, figure out what I was trying to do:

data table2;

set table1;

by subject;

if first.subject then do;

event=1;

retain event;

end;

if first.subject^=1 then event=event+1;

run;

However, I still don't get the difference between RETAIN and lag().  Any advice there?

PROC Star
Posts: 8,149

## Re: RETAIN Statement vs. lag() Function

Sorry, I misunderstood what you wanted.  Robert is correct:

data table2;

set table1;

by subject;

if first.subject then event=1;

else event+1;

run;

Posts: 3,847

## Re: RETAIN Statement vs. lag() Function

``` However, I still don't get the difference between RETAIN and lag().  Any advice there?
```

I reckon the most basic difference is

• RETAIN is a statement
• LAG is a function

:smileymischief:

In your case to keep a variable from being initialized to missing at the top of the data statement loop use RETAIN.

Solution
‎01-31-2013 02:11 PM
Super User
Posts: 6,641

## Re: RETAIN Statement vs. lag() Function

First, a solution:

if first.subject then event=1;

else event + 1;

Basically, event + 1 has three-fold meaning:  (1) add 1 to event, (2) retain event, and (3) if the value to the right of the plus sign is missing, ignore it and let event keep its current value.  Obviously that doesn't apply here, since "1" will never be missing.  As a side note, this statement gives event a value of 0 at the very beginning of the DATA step, before reading in any data.

LAG is tricky, and does not necessarily retrieve the value from the previous observation.  It retrieves the value from the last time the LAG function executed.  That's easy to see if you have a third variable in your data set with a log of variability.  In that case, try this and see what you get:

if first.subject then new_variable = lag(third_variable);

To force LAG to retrieve the value from the previous observation (which is what you would want most of the time), make sure it executes on every observation ... outside of any IF/THEN statements, before deleting any observations with a subsetting IF.

Good luck.

Super Contributor
Posts: 578

## Re: RETAIN Statement vs. lag() Function

The lag function is tricky about when it actually gets called and whats on the stack when it hasn't been called.  Here's a pretty good article on it:  http://changchung.com/download/retainLagLeadInterleave_draft.pdf

Of course, to do what you are asking is trivial:

data want;

set table1;

by subject;

if first.subject then event=1;

else event+1;

run;

🔒 This topic is solved and locked.