Simple RETAIN Problem

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Simple RETAIN Problem

Hi folks,

 

I've been SAS programming for about a year and have successfully used the RETAIN statement in the past but am having great difficulty using it this morning.

 

At our university, we are trying to examine the year-to-year percent change in students applications, admissions, and subsequent enrollments. I built one data set with numbers for 2014 and another data set for 2015. I then "set" them together into a single data set.

 

data pct_change;
	set tabulate
	    prior_tabulate;
run;

 This results in a data set called pct_change with two observations/rows, which look as follows:

_TYPE_     applicant_sum     admit_sum     enrollment_sum     prior_applicant_sum     prior_admit_sum     prior_enrollment_sum
0          1850              880           330
0                                                             1390                    620                 240

I simply want to squish these observations together so that I can perform some simple arithmetic on new values against prior year values.

 

I know that it would be possible to do this with a MERGE statment but I want to understand how to use the RETAIN statement. Here is what I tried:

data pct_change;
     set pct_change;
     by _type_;
     retain applicant_sum admit_sum enrollment_sum;
run;

I've read about 4 or 5 step-by-step guides and I think the problem is that I am not using an INPUT statement or an assignment statement. I don't understand how to use an INPUT statement in this situation, but I did try to use an assignment statement, as follows:

data pct_change;
     set pct_change;
     by _type_;

     applicant_sum_new = applicant_sum;
     admit_sum_new = admit_sum;
     enrollment_sum_new = enrollment_sum;

     retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;

Now I have performed multiple assignments within the data step. Per SAS documentation, "RETAIN: Causes a variable that is created by an INPUT or assignment statement to retain its value from one iteration of the DATA step to the next.

 

If I just created these variables with an assignment step, why does RETAIN not work like I expect? Can someone please explain how to properly use RETAIN to get values from both observations into a single observation?

 

Thanks so much!!


Accepted Solutions
Solution
‎03-23-2016 11:22 AM
Esteemed Advisor
Posts: 6,661

Re: Simple RETAIN Problem


techsassy wrote:

If the last part of your response is true, then why does RETAIN not work when I "newly create" the variables applicant_sum_new, admit_sum_new, and enrollment_sum_new in the last half of my original post? This is where I'm really getting confused.


Because you always replaced the variables with current values before the implicit OUTPUT that happens at the end of the data step.

See my little correction and test what it does:

data pct_change;
     set pct_change;
     by _type_;

     output;

     applicant_sum_new = applicant_sum;
     admit_sum_new = admit_sum;
     enrollment_sum_new = enrollment_sum;

     retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Esteemed Advisor
Esteemed Advisor
Posts: 7,217

Re: Simple RETAIN Problem

No need to do any of that.  Just merge one dataset to the other:

data pct_change;
  merge tabulate prior_tabulate;
  by _type_;
run;
Occasional Contributor
Posts: 16

Re: Simple RETAIN Problem

Thanks, RW9, this is a solution that I may settle on but for now I would really like to understand how to use the retain statement and why it is not behaving the way I expected it to.
Esteemed Advisor
Posts: 6,661

Re: Simple RETAIN Problem


techsassy wrote:
Thanks, RW9, this is a solution that I may settle on but for now I would really like to understand how to use the retain statement and why it is not behaving the way I expected it to.

RETAINING a variable that is part of the input dataset is useless, as the value will always be replaced with the value of the dataset when the next observation is read. RETAIN only works with variables that are newly created in the data step.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 16

Re: Simple RETAIN Problem

If the last part of your response is true, then why does RETAIN not work when I "newly create" the variables applicant_sum_new, admit_sum_new, and enrollment_sum_new in the last half of my original post? This is where I'm really getting confused.

Solution
‎03-23-2016 11:22 AM
Esteemed Advisor
Posts: 6,661

Re: Simple RETAIN Problem


techsassy wrote:

If the last part of your response is true, then why does RETAIN not work when I "newly create" the variables applicant_sum_new, admit_sum_new, and enrollment_sum_new in the last half of my original post? This is where I'm really getting confused.


Because you always replaced the variables with current values before the implicit OUTPUT that happens at the end of the data step.

See my little correction and test what it does:

data pct_change;
     set pct_change;
     by _type_;

     output;

     applicant_sum_new = applicant_sum;
     admit_sum_new = admit_sum;
     enrollment_sum_new = enrollment_sum;

     retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Trusted Advisor
Posts: 1,114

Re: Simple RETAIN Problem

Alternatively, you could perform the assignments conditionally to avoid overwriting:

data pct_change;
     set pct_change;
     by _type_;
     if _n_=1 then do;  /* <---- */
       applicant_sum_new = applicant_sum;
       admit_sum_new = admit_sum;
       enrollment_sum_new = enrollment_sum;
     end;               /* <---- */
     retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;

Also, the documentation says:

"It is redundant to name any of these items in a RETAIN statement, because their values are automatically retained from one iteration of the DATA step to the next:

  • variables that are read with a SET, MERGE, MODIFY or UPDATE statement
  • (...)"

This can be demonstrated by the following data step:

data pct_change;
if _n_=1 then set tabulate;
else set prior_tabulate;
run;

Actually I think you don't want to have two observations, but only one containing all variables. So, you could make use of the fact that within one iteration of the data step no RETAIN is necessary and simply use @Astounding's suggestion using the double SET statement.

Super Contributor
Posts: 305

Re: Simple RETAIN Problem

Hello,

 

In

data pct_change;
     set pct_change;
     by _type_;
     retain applicant_sum admit_sum enrollment_sum;
run;

you are telling SAS - retain my variables applicant_sum , admit_sum, enrollment_sum.

 

SAS does what you tell it but the variables have the same name on pct_change database so it overwrites them

when reading observations into PDV.

 

Basically your retain statement does nothing...

 

Consider the following aprroach (code not tested):

 

data want;

set pct_change;

by _type_;

 

retain applicant_sum_r admit_sum_r enrollment_sum_r;

 

if missing(applicant_sum)=0 then applicant_sum_r=applicant_sum;

if missing(admit_sum_r)=0 then admit_sum_r_r=admit_sum_r;

if missing(enrollment_sum_r)=0 then enrollment_sum_r_r=enrollment_sum_r;

 

if last._type_ then output;

 

run;

 

 

Occasional Contributor
Posts: 16

Re: Simple RETAIN Problem

[ Edited ]

Hi Loko, thanks for your response. 

 

When you say, "the variables have the same name on pct_change database so it overwrites them when reading observations into PDV", do you mean it is happening because of the following code?

data pct_change;
     set pct_change;

Even if I change it to the following, it still doesn't work:

data pct_change_new;
     set pct_change;

Since my variables are called applicant_sum and prior_applicant_sum, admit_sum and prior_admit_sum, enrollment_sum and prior_enrollment_sum, I'm not sure what else you could mean by "the variables have the same name on pct_change database".

 

I did try the code that you provided and was able to tweak it and make it work.

 

More importantly, however, I'm afraid that I still don't really understand why retain isn't working the way I think it should. Can you elaborate on what you mean by "the variables have the same name on pct_change database so it overwrites them"?

 

Thanks!!

Super Contributor
Posts: 305

Re: Simple RETAIN Problem

Consider the following example:

data have;

x=1;output;

x=2;output;

x=3;output;

run;

 

data want;

set have;

retain x;

run;

 

what happens in sas is:

Reading the first observation --> x=1

 

Reading the second observation before set statement x=1 (which happens because of the set statement, retain statement being useless in my example, is only to mimic your situation)

Reading the second observation after set statement x=2

 

Reading the third observation before set statement x=2

Reading the third observation after set statement x=3

 

So because the variable have the same name the retain does nothing.

 

Instead consider the following example:

data have;

x=1;output;

x=2;output;

x=3;output;

run;

 

data want;

set have;

retain y;

 

y=x;

run;

 

what happens in sas is:

Reading the first observation --> x=1, y=1

 

Reading the second observation before set statement x=1 (because of the set statement), y=1 (because of retain statement)

Reading the second observation after set statement and before y=x assignment x=2, y=1

Reading the second observation after set statement and after y=x assignment x=2, y=2

 

Reading the third observation before set statement x=2 (because of the set statement), y=2 (because of retain statement)

Reading the third observation after set statement and before y=x assignment x=3, y=2

Reading the third observation after set statement and after y=x assignment x=3, y=3

 

 

 

Esteemed Advisor
Posts: 6,661

Re: Simple RETAIN Problem

So you want your values from observation 1 to "spill over" into observation 2?

 

Aside from a simple MERGE being the best option for your problem, try

data pct_change;
set pct_change;
x1 = lag(applicant_sum);
x1 = lag(admit_sum);
x3 = lag(enrollment_sum);
if _n_ = 2
then do;
  applicant_sum = x1;
  admit_sum = x2;
  enrollment_sum = x3;
end;
drop x1 x2 x3;
run;

 

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 16

Re: Simple RETAIN Problem

Thank you for the response and example, Kurt. Why is LAG a better option than RETAIN? Also, is RETAIN an inappropriate option, or just not a preferred option?
Esteemed Advisor
Posts: 6,661

Re: Simple RETAIN Problem

No, you can also use a RETAIN, like

data pct_change;
set pct_change;
retain x1 x2 x3;
if _n_ = 2
then do;
  applicant_sum = x1;
  admit_sum = x2;
  enrollment_sum = x3;
end;
x1 = applicant_sum;
x2 = admit_sum;
x3 = enrollment_sum;
drop x1 x2 x3;
run;

See how I changed the order of statements because of the way that the LAG function works.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 4,976

Re: Simple RETAIN Problem

Another simple version:

 

data want;

set tabulate;

set prior_tabulate;

run;

 

To add to the many accurate observations that have already been made here ...

 

All variables from SAS data sets are automatically retained.  The RETAIN statement that you have actually does nothing.  You can see this for yourself by adding some diagnostic code to your program:

 

data test;

put _all_;

set pct_change;

put _all_;

run;

 

As has been mentioned, the SET statement brings in another observation.  That new observation overwrites what was in place before, so you don't see the impact of RETAIN after the SET statement has replaced any retained values.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 792 views
  • 9 likes
  • 6 in conversation