I have always struggled with the RETAIN statement vs. the lag() function. I am trying to increment a variable that starts over when at a new patient like this:
Subject | Event |
---|---|
001 | 1 |
001 | 2 |
001 | 3 |
002 | 1 |
002 | 2 |
003 | 1 |
003 | 2 |
003 | 3 |
003 | 4 |
I have tried a couple of things, and neither of them work:
/* Using the RETAIN statement */
data table2;
set table1;
by subject;
if first.subject then do;
retain event 1;
end;
else event=event+1;
run;
/* Using the lag() function */
data table2;
set table1;
by subject;
if first.subject then event=1;
else event=lag(event)+1;
run;
Can someone, first off, help me figure out how I can get the EVENT variable to fill in correctly? And second, is there a simple way to explain the difference between RETAIN and lag()? I have read some SUG literature, but nothing seems to sink in for me.
First, a solution:
if first.subject then event=1;
else event + 1;
Basically, event + 1 has three-fold meaning: (1) add 1 to event, (2) retain event, and (3) if the value to the right of the plus sign is missing, ignore it and let event keep its current value. Obviously that doesn't apply here, since "1" will never be missing. As a side note, this statement gives event a value of 0 at the very beginning of the DATA step, before reading in any data.
LAG is tricky, and does not necessarily retrieve the value from the previous observation. It retrieves the value from the last time the LAG function executed. That's easy to see if you have a third variable in your data set with a log of variability. In that case, try this and see what you get:
if first.subject then new_variable = lag(third_variable);
To force LAG to retrieve the value from the previous observation (which is what you would want most of the time), make sure it executes on every observation ... outside of any IF/THEN statements, before deleting any observations with a subsetting IF.
Good luck.
I would use the sum operator which is automatically retained. e.g.:
data table1;
input Subject $;
cards;
001
001
001
002
002
003
003
003
003
;
data table2;
set table1;
by subject;
if first.subject then event+1;
run;
When I try that, I get the table to look like this:
Subject Event (Should be this)
001 1 1
001 1 2
001 1 3
002 2 1
002 2 2
003 3 1
003 3 2
003 3 3
003 3 4
I did, however, figure out what I was trying to do:
data table2;
set table1;
by subject;
if first.subject then do;
event=1;
retain event;
end;
if first.subject^=1 then event=event+1;
run;
However, I still don't get the difference between RETAIN and lag(). Any advice there?
Sorry, I misunderstood what you wanted. Robert is correct:
data table2;
set table1;
by subject;
if first.subject then event=1;
else event+1;
run;
However, I still don't get the difference between RETAIN and lag(). Any advice there?
I reckon the most basic difference is
:smileymischief:
In your case to keep a variable from being initialized to missing at the top of the data statement loop use RETAIN.
First, a solution:
if first.subject then event=1;
else event + 1;
Basically, event + 1 has three-fold meaning: (1) add 1 to event, (2) retain event, and (3) if the value to the right of the plus sign is missing, ignore it and let event keep its current value. Obviously that doesn't apply here, since "1" will never be missing. As a side note, this statement gives event a value of 0 at the very beginning of the DATA step, before reading in any data.
LAG is tricky, and does not necessarily retrieve the value from the previous observation. It retrieves the value from the last time the LAG function executed. That's easy to see if you have a third variable in your data set with a log of variability. In that case, try this and see what you get:
if first.subject then new_variable = lag(third_variable);
To force LAG to retrieve the value from the previous observation (which is what you would want most of the time), make sure it executes on every observation ... outside of any IF/THEN statements, before deleting any observations with a subsetting IF.
Good luck.
The lag function is tricky about when it actually gets called and whats on the stack when it hasn't been called. Here's a pretty good article on it: http://changchung.com/download/retainLagLeadInterleave_draft.pdf
Of course, to do what you are asking is trivial:
data want;
set table1;
by subject;
if first.subject then event=1;
else event+1;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.