turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Simple RETAIN Problem

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-23-2016 09:44 AM

Hi folks,

I've been SAS programming for about a year and have successfully used the RETAIN statement in the past but am having great difficulty using it this morning.

At our university, we are trying to examine the year-to-year percent change in students applications, admissions, and subsequent enrollments. I built one data set with numbers for 2014 and another data set for 2015. I then "set" them together into a single data set.

data pct_change; set tabulate prior_tabulate; run;

This results in a data set called pct_change with two observations/rows, which look as follows:

_TYPE_ applicant_sum admit_sum enrollment_sum prior_applicant_sum prior_admit_sum prior_enrollment_sum 0 1850 880 330 0 1390 620 240

I simply want to squish these observations together so that I can perform some simple arithmetic on new values against prior year values.

I know that it would be possible to do this with a MERGE statment but I want to understand how to use the RETAIN statement. Here is what I tried:

data pct_change; set pct_change; by _type_; retain applicant_sum admit_sum enrollment_sum; run;

I've read about 4 or 5 step-by-step guides and I think the problem is that I am not using an INPUT statement or an assignment statement. I don't understand how to use an INPUT statement in this situation, but I did try to use an assignment statement, as follows:

data pct_change; set pct_change; by _type_; applicant_sum_new = applicant_sum; admit_sum_new = admit_sum; enrollment_sum_new = enrollment_sum; retain applicant_sum_new admit_sum_new enrollment_sum_new; run;

Now I have performed multiple assignments within the data step. Per SAS documentation, "**RETAIN: ****Causes a variable that is created by an INPUT or assignment statement to retain its value from one iteration of the DATA step to the next.**"

If I just created these variables with an assignment step, why does RETAIN not work like I expect? Can someone please explain how to properly use RETAIN to get values from both observations into a single observation?

Thanks so much!!

Accepted Solutions

Solution

03-23-2016
11:22 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to techsassy

03-23-2016 10:18 AM

techsassy wrote:

If the last part of your response is true, then why does RETAIN not work when I "newly create" the variables applicant_sum_new, admit_sum_new, and enrollment_sum_new in the last half of my original post? This is where I'm really getting confused.

Because you always replaced the variables with current values before the implicit OUTPUT that happens at the end of the data step.

See my little correction and test what it does:

```
data pct_change;
set pct_change;
by _type_;
output;
applicant_sum_new = applicant_sum;
admit_sum_new = admit_sum;
enrollment_sum_new = enrollment_sum;
retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;
```

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to techsassy

03-23-2016 09:48 AM

No need to do any of that. Just merge one dataset to the other:

data pct_change; merge tabulate prior_tabulate; by _type_; run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-23-2016 10:05 AM

Thanks, RW9, this is a solution that I may settle on but for now I would really like to understand how to use the retain statement and why it is not behaving the way I expected it to.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to techsassy

03-23-2016 10:11 AM

techsassy wrote:

Thanks, RW9, this is a solution that I may settle on but for now I would really like to understand how to use the retain statement and why it is not behaving the way I expected it to.

RETAINING a variable that is part of the input dataset is useless, as the value will always be replaced with the value of the dataset when the next observation is read. RETAIN only works with variables that are newly created in the data step.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-23-2016 10:13 AM

Solution

03-23-2016
11:22 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to techsassy

03-23-2016 10:18 AM

techsassy wrote:

Because you always replaced the variables with current values before the implicit OUTPUT that happens at the end of the data step.

See my little correction and test what it does:

```
data pct_change;
set pct_change;
by _type_;
output;
applicant_sum_new = applicant_sum;
admit_sum_new = admit_sum;
enrollment_sum_new = enrollment_sum;
retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;
```

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to techsassy

03-23-2016 10:36 AM

Alternatively, you could perform the assignments conditionally to avoid overwriting:

```
data pct_change;
set pct_change;
by _type_;
if _n_=1 then do; /* <---- */
applicant_sum_new = applicant_sum;
admit_sum_new = admit_sum;
enrollment_sum_new = enrollment_sum;
end; /* <---- */
retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;
```

Also, the documentation says:

"It is redundant to name any of these items in a RETAIN statement, because their values are automatically retained from one iteration of the DATA step to the next:

- variables that are read with a SET, MERGE, MODIFY or UPDATE statement
- (...)"

This can be demonstrated by the following data step:

```
data pct_change;
if _n_=1 then set tabulate;
else set prior_tabulate;
run;
```

Actually I think you don't want to have two observations, but only one containing all variables. So, you could make use of the fact that within one iteration of the data step no RETAIN is necessary and simply use @Astounding's suggestion using the double SET statement.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to techsassy

03-23-2016 09:53 AM

Hello,

In

data pct_change; set pct_change; by _type_; retain applicant_sum admit_sum enrollment_sum; run;

you are telling SAS - retain my variables applicant_sum , admit_sum, enrollment_sum.

SAS does what you tell it but the variables have the same name on pct_change database so it overwrites them

when reading observations into PDV.

Basically your retain statement does nothing...

Consider the following aprroach (code not tested):

data want;

set pct_change;

by _type_;

retain applicant_sum_r admit_sum_r enrollment_sum_r;

if missing(applicant_sum)=0 then applicant_sum_r=applicant_sum;

if missing(admit_sum_r)=0 then admit_sum_r_r=admit_sum_r;

if missing(enrollment_sum_r)=0 then enrollment_sum_r_r=enrollment_sum_r;

if last._type_ then output;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Loko

03-23-2016 10:04 AM - edited 03-23-2016 10:07 AM

Hi Loko, thanks for your response.

When you say, "the variables have the same name on pct_change database so it overwrites them when reading observations into PDV", do you mean it is happening because of the following code?

data pct_change; set pct_change;

Even if I change it to the following, it still doesn't work:

data pct_change_new; set pct_change;

Since my variables are called applicant_sum and prior_applicant_sum, admit_sum and prior_admit_sum, enrollment_sum and prior_enrollment_sum, I'm not sure what else you could mean by "the variables have the same name on pct_change database".

I did try the code that you provided and was able to tweak it and make it work.

More importantly, however, I'm afraid that I still don't really understand why retain isn't working the way I think it should. Can you elaborate on what you mean by "the variables have the same name on pct_change database so it overwrites them"?

Thanks!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to techsassy

03-23-2016 11:06 AM

Consider the following example:

data have;

x=1;output;

x=2;output;

x=3;output;

run;

data want;

set have;

retain x;

run;

what happens in sas is:

Reading the first observation --> x=1

Reading the second observation **before set statement** **x=1 **(which happens because of the set statement, **retain statement being useless in my example**, is only to mimic your situation)

Reading the second observation **after set statement** **x=2**

Reading the third observation **before set statement** **x=2**

Reading the third observation **after set statement** **x=3**

So because the variable have the same name the retain does nothing.

Instead consider the following example:

data have;

x=1;output;

x=2;output;

x=3;output;

run;

data want;

set have;

retain y;

y=x;

run;

what happens in sas is:

Reading the first observation --> x=1, y=1

Reading the second observation **before set statement** **x=1 **(because of the set statement), y=1 (**because of retain statement)**

Reading the second observation **after set statement and before y=x assignment** **x=2, y=1**

Reading the second observation **after set statement and after y=x assignment** **x=2, y=2**

Reading the third observation **before set statement** **x=2 **(because of the set statement), y=2 (**because of retain statement)**

Reading the third observation **after set statement and before y=x assignment** **x=3, y=2**

Reading the third observation** after set statement and after y=x assignment x=3, y=3**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to techsassy

03-23-2016 10:08 AM

So you want your values from observation 1 to "spill over" into observation 2?

Aside from a simple MERGE being the best option for your problem, try

```
data pct_change;
set pct_change;
x1 = lag(applicant_sum);
x1 = lag(admit_sum);
x3 = lag(enrollment_sum);
if _n_ = 2
then do;
applicant_sum = x1;
admit_sum = x2;
enrollment_sum = x3;
end;
drop x1 x2 x3;
run;
```

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to KurtBremser

03-23-2016 10:11 AM

Thank you for the response and example, Kurt. Why is LAG a better option than RETAIN? Also, is RETAIN an inappropriate option, or just not a preferred option?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to techsassy

03-23-2016 10:15 AM

No, you can also use a RETAIN, like

```
data pct_change;
set pct_change;
retain x1 x2 x3;
if _n_ = 2
then do;
applicant_sum = x1;
admit_sum = x2;
enrollment_sum = x3;
end;
x1 = applicant_sum;
x2 = admit_sum;
x3 = enrollment_sum;
drop x1 x2 x3;
run;
```

See how I changed the order of statements because of the way that the LAG function works.

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to techsassy

03-23-2016 10:29 AM

Another simple version:

data want;

set tabulate;

set prior_tabulate;

run;

To add to the many accurate observations that have already been made here ...

All variables from SAS data sets are automatically retained. The RETAIN statement that you have actually does nothing. You can see this for yourself by adding some diagnostic code to your program:

data test;

put _all_;

set pct_change;

put _all_;

run;

As has been mentioned, the SET statement brings in another observation. That new observation overwrites what was in place before, so you don't see the impact of RETAIN after the SET statement has replaced any retained values.