BookmarkSubscribeRSS Feed
jimmychoi
Obsidian | Level 7
data a2; set a1;
retain sumx 0;
sumx=sumx+x;
run;

Dear friends,

what I was told about retain is that it initializes the variable to zero.

When used with 'set' statement, since set reads records one by one,

in the example above, records of dataset a1 will flow into dataset a2, will iterate through records and add them up. (sumx=sum+x;)

Now, this is the confusing part. sumx=sumx+x; gets executed every time, and what about retain?

Is it re-initialized to zero every time? I know it's not but its name - RETAIN - is pretty much confusing.

Does it really have a function that has to do with RETAINING the values within?

11 REPLIES 11
ChrisNZ
Tourmaline | Level 20

The best way to answer (and to memorise the answer) this type of question is to try.

 

If you remove the retain in your data step, you'll see that SUMX is reset to missing with each iteration (in other words, it is reset to missing with each new observation in A1). That's the default behaviour.

 

Retain is a compile-time instruction that instructs SAS  to *not* reset variables when iterating the data step.

Cynthia_sas
SAS Super FREQ

Hi: Here's a concrete example of what happens without the retain, using this test data:

data a1;
  infile datalines;
  input id $ x;
datalines;
A01 11 
A02 13 
A03 14 
A04 16 
A05  6 
A06 15 
A07 12 
A08 14 
A09  5 
A10 12 
;
run;

proc print data=a1;
  title 'What is in A1';
run;

data a2; 
  set a1;
retain sumx 0;
sumx=sumx+x;
run;

proc print data=a2;
  title 'What is in A2 when using RETAIN';
run;

data a3;
  set a1;
  sumx = sumx+x;
run;

proc print data=a3;
  title 'What is in A3 withOUT using RETAIN';
run;

And, here are results from dataset A2 vs dataset A3:

retain_vs_no_retain.png

 

Hope this helps explain it.

 

Cynthia

Reeza
Super User

@jimmychoi wrote:
 

what I was told about retain is that it initializes the variable to zero.

 


Start with the documentation instead.

 

This is a good page to bookmark, it lets you quickly launch to any documentation you need easily:

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=pgmsashome&docsetTarget=h...

 

Retain is a statement, so click on STATEMENTS and then R, then RETAIN.

 

Specifies that all columns or all columns in the column list will have their values retained between executions of the RUN method.

 

In fact, initialization is not the primary function of RETAIN at all, its a nice to have. 

If you want to initialize variables, use this method instead:

 

if _n_=1 then do;
    x=1; y=2;
end;
hashman
Ammonite | Level 13

@Reeza:

 

This statement in the Viya docs you're citing, "Specifies that all columns or all columns in the column list will have their values retained between executions of the RUN method." is confusing to the core. To anyone capable of reading plain English it would mean that the values will remain unchanged between the executions no matter what - which is absolutely NOT the case.

 

With respect to the unfortunate RETAIN statement, SAS has never fixed its docs since Version 5 (and perhaps earilier) to plainly say that RETAIN does not retain anything but merely prevents certain variables from being automatically reset to missing values at the top of the implied loop. I suspect this confusion will never end.

 

Paul D.

FreelanceReinh
Jade | Level 19

@Reeza: Many thanks for sharing this helpful link.

 

Unfortunately, some of the "quick links" there lead to pages where DATA Step documentation is interspersed with DS2, FedSQL and other special documentation. (This is of course the same issue as with search results on support.sas.com and also with the built-in help files since the advent of DS2 etc.) As a consequence, many keywords occur with two or more meanings, i.e. there are multiple links for the same keyword.

 

It is too easy to accidentally click the wrong link, as you have demonstrated. The "SAS DATA Step Statements: Reference" says about RETAIN:

RETAIN Statement

Causes a variable that is created by an INPUT or assignment statement to retain its value from one iteration of the DATA step to the next.

 

And this sounds much more familiar than the description of the DS2 statement of the same name.

Reeza
Super User

@FreelanceReinh The first time I searched RETAIN I think it returned something in the TTEST docs so I totally understand that. I've been finding searches to be useless both via Google or SAS site, it seems you need to know exactly where to click/look.

 

@hashman The language could be clearer, no disagreement there. 

hashman
Ammonite | Level 13

@FreelanceReinh:

 

My dog-eared SAS User's Guide: Basics Version 5 Edition says: 

 

"The RETAIN statement causes a variable created by an INPUT statement or assignment statement to retain its value from the previous iteration of the DATA step." The only change in 2018 from 1985 is "from one iteration ... to the next" instead of "from the previous iteration". But at least the old manual states in the very next sentence: "Without a RETAIN statement, the SAS System automatically sets variables created by INPUT or assignment statements to missing before each iteration of the DATA step.", so an alert reader would hopefully understand right there and then that RETAIN merely turns the automatic clean-up action off.

 

The statement is still not entirely accurate since it doesn't tell the whole truth, for there are other variables, not only those "created by INPUT or assignment statements" that are set to missing at the top of the DATA step implied loop, such as X in: 

 

call missing (X) ;

y = sum (X, 0) ;

 

and so on where X is clearly not created by either an INPUT or assignment statement, and yet not retained (in the correct sense). 

 

Paul D.

 

    

 

 

ballardw
Super User

For extra confusion consider:

data a2; 
   set a1;
   sumx+x;
run;

 

Also, if you RETAIN a variable in a dataset referenced on a SET statement you get the value from the dataset each time the SET statement iterates.

hashman
Ammonite | Level 13

@ballardw

 

Oh yes, and examples of this kind can be multiplied. At least in V5 manual, the initial explanation of RETAIN is followed by a full page occupied by a table showing which sorts of variables aren't affected by RETAIN and which are and how. In particular, it says that variables brought in by SET, MERGE, and UPDATE aren't affected. Now, they would've added MODIFY but it wasn't part of SAS back then.

 

Paul D.

Tom
Super User Tom
Super User

Note that any variable that exists in a input dataset ( via SET, MERGE ... statement ) is also retained.  Whether it is mentioned in a RETAIN statement (or a sum statement) or not.

 

It is just that when a new observation is read the new value read overwrites the retained value.

 

That is why one to many merges work.

hashman
Ammonite | Level 13

@Tom:

 

In the old V5 manual all the RETAIN actions are laid out clearly and their details under different scenarios are spelled out in Table 4.7, page 196. It is just the first-sentence verbiage that is misleading, and it's remained practically unchanged ever since. One may say that it's been firmly retained ;).

 

Paul D.  

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1880 views
  • 7 likes
  • 8 in conversation