## Simple RETAIN Problem

Solved
Occasional Contributor
Posts: 16

# Simple RETAIN Problem

Hi folks,

I've been SAS programming for about a year and have successfully used the RETAIN statement in the past but am having great difficulty using it this morning.

At our university, we are trying to examine the year-to-year percent change in students applications, admissions, and subsequent enrollments. I built one data set with numbers for 2014 and another data set for 2015. I then "set" them together into a single data set.

```data pct_change;
set tabulate
prior_tabulate;
run;```

This results in a data set called pct_change with two observations/rows, which look as follows:

```_TYPE_     applicant_sum     admit_sum     enrollment_sum     prior_applicant_sum     prior_admit_sum     prior_enrollment_sum
0          1850              880           330
0                                                             1390                    620                 240```

I simply want to squish these observations together so that I can perform some simple arithmetic on new values against prior year values.

I know that it would be possible to do this with a MERGE statment but I want to understand how to use the RETAIN statement. Here is what I tried:

```data pct_change;
set pct_change;
by _type_;
run;```

I've read about 4 or 5 step-by-step guides and I think the problem is that I am not using an INPUT statement or an assignment statement. I don't understand how to use an INPUT statement in this situation, but I did try to use an assignment statement, as follows:

```data pct_change;
set pct_change;
by _type_;

applicant_sum_new = applicant_sum;
enrollment_sum_new = enrollment_sum;

run;```

Now I have performed multiple assignments within the data step. Per SAS documentation, "RETAIN: Causes a variable that is created by an INPUT or assignment statement to retain its value from one iteration of the DATA step to the next.

If I just created these variables with an assignment step, why does RETAIN not work like I expect? Can someone please explain how to properly use RETAIN to get values from both observations into a single observation?

Thanks so much!!

Accepted Solutions
Solution
‎03-23-2016 11:22 AM
Super User
Posts: 10,217

## Re: Simple RETAIN Problem

techsassy wrote:

If the last part of your response is true, then why does RETAIN not work when I "newly create" the variables applicant_sum_new, admit_sum_new, and enrollment_sum_new in the last half of my original post? This is where I'm really getting confused.

Because you always replaced the variables with current values before the implicit OUTPUT that happens at the end of the data step.

See my little correction and test what it does:

``````data pct_change;
set pct_change;
by _type_;

output;

applicant_sum_new = applicant_sum;
enrollment_sum_new = enrollment_sum;

run;``````
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

All Replies
Super User
Posts: 9,599

## Re: Simple RETAIN Problem

No need to do any of that.  Just merge one dataset to the other:

```data pct_change;
merge tabulate prior_tabulate;
by _type_;
run;```
Occasional Contributor
Posts: 16

## Re: Simple RETAIN Problem

Thanks, RW9, this is a solution that I may settle on but for now I would really like to understand how to use the retain statement and why it is not behaving the way I expected it to.
Super User
Posts: 10,217

## Re: Simple RETAIN Problem

techsassy wrote:
Thanks, RW9, this is a solution that I may settle on but for now I would really like to understand how to use the retain statement and why it is not behaving the way I expected it to.

RETAINING a variable that is part of the input dataset is useless, as the value will always be replaced with the value of the dataset when the next observation is read. RETAIN only works with variables that are newly created in the data step.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 16

## Re: Simple RETAIN Problem

If the last part of your response is true, then why does RETAIN not work when I "newly create" the variables applicant_sum_new, admit_sum_new, and enrollment_sum_new in the last half of my original post? This is where I'm really getting confused.

Solution
‎03-23-2016 11:22 AM
Super User
Posts: 10,217

## Re: Simple RETAIN Problem

techsassy wrote:

If the last part of your response is true, then why does RETAIN not work when I "newly create" the variables applicant_sum_new, admit_sum_new, and enrollment_sum_new in the last half of my original post? This is where I'm really getting confused.

Because you always replaced the variables with current values before the implicit OUTPUT that happens at the end of the data step.

See my little correction and test what it does:

``````data pct_change;
set pct_change;
by _type_;

output;

applicant_sum_new = applicant_sum;
enrollment_sum_new = enrollment_sum;

run;``````
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Posts: 1,246

## Re: Simple RETAIN Problem

Alternatively, you could perform the assignments conditionally to avoid overwriting:

``````data pct_change;
set pct_change;
by _type_;
if _n_=1 then do;  /* <---- */
applicant_sum_new = applicant_sum;
enrollment_sum_new = enrollment_sum;
end;               /* <---- */
run;``````

Also, the documentation says:

"It is redundant to name any of these items in a RETAIN statement, because their values are automatically retained from one iteration of the DATA step to the next:

• variables that are read with a SET, MERGE, MODIFY or UPDATE statement
• (...)"

This can be demonstrated by the following data step:

``````data pct_change;
if _n_=1 then set tabulate;
else set prior_tabulate;
run;``````

Actually I think you don't want to have two observations, but only one containing all variables. So, you could make use of the fact that within one iteration of the data step no RETAIN is necessary and simply use @Astounding's suggestion using the double SET statement.

Super Contributor
Posts: 319

## Re: Simple RETAIN Problem

Hello,

In

```data pct_change;
set pct_change;
by _type_;
run;```

you are telling SAS - retain my variables applicant_sum , admit_sum, enrollment_sum.

SAS does what you tell it but the variables have the same name on pct_change database so it overwrites them

Basically your retain statement does nothing...

Consider the following aprroach (code not tested):

data want;

set pct_change;

by _type_;

if missing(applicant_sum)=0 then applicant_sum_r=applicant_sum;

if missing(enrollment_sum_r)=0 then enrollment_sum_r_r=enrollment_sum_r;

if last._type_ then output;

run;

Occasional Contributor
Posts: 16

## Re: Simple RETAIN Problem

[ Edited ]

Hi Loko, thanks for your response.

When you say, "the variables have the same name on pct_change database so it overwrites them when reading observations into PDV", do you mean it is happening because of the following code?

```data pct_change;
set pct_change;```

Even if I change it to the following, it still doesn't work:

```data pct_change_new;
set pct_change;```

Since my variables are called applicant_sum and prior_applicant_sum, admit_sum and prior_admit_sum, enrollment_sum and prior_enrollment_sum, I'm not sure what else you could mean by "the variables have the same name on pct_change database".

I did try the code that you provided and was able to tweak it and make it work.

More importantly, however, I'm afraid that I still don't really understand why retain isn't working the way I think it should. Can you elaborate on what you mean by "the variables have the same name on pct_change database so it overwrites them"?

Thanks!!

Super Contributor
Posts: 319

## Re: Simple RETAIN Problem

Consider the following example:

data have;

x=1;output;

x=2;output;

x=3;output;

run;

data want;

set have;

retain x;

run;

what happens in sas is:

Reading the first observation --> x=1

Reading the second observation before set statement x=1 (which happens because of the set statement, retain statement being useless in my example, is only to mimic your situation)

Reading the second observation after set statement x=2

Reading the third observation before set statement x=2

Reading the third observation after set statement x=3

So because the variable have the same name the retain does nothing.

data have;

x=1;output;

x=2;output;

x=3;output;

run;

data want;

set have;

retain y;

y=x;

run;

what happens in sas is:

Reading the first observation --> x=1, y=1

Reading the second observation before set statement x=1 (because of the set statement), y=1 (because of retain statement)

Reading the second observation after set statement and before y=x assignment x=2, y=1

Reading the second observation after set statement and after y=x assignment x=2, y=2

Reading the third observation before set statement x=2 (because of the set statement), y=2 (because of retain statement)

Reading the third observation after set statement and before y=x assignment x=3, y=2

Reading the third observation after set statement and after y=x assignment x=3, y=3

Super User
Posts: 10,217

## Re: Simple RETAIN Problem

So you want your values from observation 1 to "spill over" into observation 2?

Aside from a simple MERGE being the best option for your problem, try

``````data pct_change;
set pct_change;
x1 = lag(applicant_sum);
x3 = lag(enrollment_sum);
if _n_ = 2
then do;
applicant_sum = x1;
enrollment_sum = x3;
end;
drop x1 x2 x3;
run;``````

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 16

## Re: Simple RETAIN Problem

Thank you for the response and example, Kurt. Why is LAG a better option than RETAIN? Also, is RETAIN an inappropriate option, or just not a preferred option?
Super User
Posts: 10,217

## Re: Simple RETAIN Problem

No, you can also use a RETAIN, like

``````data pct_change;
set pct_change;
retain x1 x2 x3;
if _n_ = 2
then do;
applicant_sum = x1;
enrollment_sum = x3;
end;
x1 = applicant_sum;
x3 = enrollment_sum;
drop x1 x2 x3;
run;``````

See how I changed the order of statements because of the way that the LAG function works.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 6,762

## Re: Simple RETAIN Problem

Another simple version:

data want;

set tabulate;

set prior_tabulate;

run;

All variables from SAS data sets are automatically retained.  The RETAIN statement that you have actually does nothing.  You can see this for yourself by adding some diagnostic code to your program:

data test;

put _all_;

set pct_change;

put _all_;

run;

As has been mentioned, the SET statement brings in another observation.  That new observation overwrites what was in place before, so you don't see the impact of RETAIN after the SET statement has replaced any retained values.

🔒 This topic is solved and locked.