BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
djbateman
Lapis Lazuli | Level 10

I have always struggled with the RETAIN statement vs. the lag() function.  I am trying to increment a variable that starts over when at a new patient like this:

SubjectEvent
0011
0012
0013
0021
0022
0031
0032
0033
0034

I have tried a couple of things, and neither of them work:

/* Using the RETAIN statement */

data table2;

      set table1;

      by subject;

      if first.subject then do;

            retain event 1;

      end;

            else event=event+1;

run;

/* Using the lag() function */

data table2;

      set table1;

      by subject;

      if first.subject then event=1;

            else event=lag(event)+1;

run;

Can someone, first off, help me figure out how I can get the EVENT variable to fill in correctly?  And second, is there a simple way to explain the difference between RETAIN and lag()?  I have read some SUG literature, but nothing seems to sink in for me.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

First, a solution:

if first.subject then event=1;

else event + 1;

Basically, event + 1 has three-fold meaning:  (1) add 1 to event, (2) retain event, and (3) if the value to the right of the plus sign is missing, ignore it and let event keep its current value.  Obviously that doesn't apply here, since "1" will never be missing.  As a side note, this statement gives event a value of 0 at the very beginning of the DATA step, before reading in any data.

LAG is tricky, and does not necessarily retrieve the value from the previous observation.  It retrieves the value from the last time the LAG function executed.  That's easy to see if you have a third variable in your data set with a log of variability.  In that case, try this and see what you get:

if first.subject then new_variable = lag(third_variable);

To force LAG to retrieve the value from the previous observation (which is what you would want most of the time), make sure it executes on every observation ... outside of any IF/THEN statements, before deleting any observations with a subsetting IF.

Good luck.

View solution in original post

6 REPLIES 6
art297
Opal | Level 21

I would use the sum operator which is automatically retained.  e.g.:

data table1;

  input Subject $;

  cards;

001

001

001

002

002

003

003

003

003

;

data table2;

  set table1;

  by subject;

  if first.subject then event+1;

run;

djbateman
Lapis Lazuli | Level 10

When I try that, I get the table to look like this:

Subject     Event     (Should be this)

001          1               1

001          1               2

001          1               3

002          2               1

002          2               2

003          3               1

003          3               2

003          3               3

003          3               4

I did, however, figure out what I was trying to do:

data table2;

      set table1;

      by subject;

      if first.subject then do;

      event=1;

            retain event;

      end;

      if first.subject^=1 then event=event+1;

run;

However, I still don't get the difference between RETAIN and lag().  Any advice there?

art297
Opal | Level 21

Sorry, I misunderstood what you wanted.  Robert is correct:

data table2;

  set table1;

  by subject;

  if first.subject then event=1;

  else event+1;

run;

data_null__
Jade | Level 19

However, I still don't get the difference between RETAIN and lag().  Any advice there?

I reckon the most basic difference is

  • RETAIN is a statement
  • LAG is a function

:smileymischief:

In your case to keep a variable from being initialized to missing at the top of the data statement loop use RETAIN.

Astounding
PROC Star

First, a solution:

if first.subject then event=1;

else event + 1;

Basically, event + 1 has three-fold meaning:  (1) add 1 to event, (2) retain event, and (3) if the value to the right of the plus sign is missing, ignore it and let event keep its current value.  Obviously that doesn't apply here, since "1" will never be missing.  As a side note, this statement gives event a value of 0 at the very beginning of the DATA step, before reading in any data.

LAG is tricky, and does not necessarily retrieve the value from the previous observation.  It retrieves the value from the last time the LAG function executed.  That's easy to see if you have a third variable in your data set with a log of variability.  In that case, try this and see what you get:

if first.subject then new_variable = lag(third_variable);

To force LAG to retrieve the value from the previous observation (which is what you would want most of the time), make sure it executes on every observation ... outside of any IF/THEN statements, before deleting any observations with a subsetting IF.

Good luck.

DBailey
Lapis Lazuli | Level 10

The lag function is tricky about when it actually gets called and whats on the stack when it hasn't been called.  Here's a pretty good article on it:  http://changchung.com/download/retainLagLeadInterleave_draft.pdf

Of course, to do what you are asking is trivial:

data want;

set table1;

by subject;

if first.subject then event=1;

else event+1;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 18542 views
  • 4 likes
  • 5 in conversation