BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
techsassy
Obsidian | Level 7

Hi folks,

 

I've been SAS programming for about a year and have successfully used the RETAIN statement in the past but am having great difficulty using it this morning.

 

At our university, we are trying to examine the year-to-year percent change in students applications, admissions, and subsequent enrollments. I built one data set with numbers for 2014 and another data set for 2015. I then "set" them together into a single data set.

 

data pct_change;
	set tabulate
	    prior_tabulate;
run;

 This results in a data set called pct_change with two observations/rows, which look as follows:

_TYPE_     applicant_sum     admit_sum     enrollment_sum     prior_applicant_sum     prior_admit_sum     prior_enrollment_sum
0          1850              880           330
0                                                             1390                    620                 240

I simply want to squish these observations together so that I can perform some simple arithmetic on new values against prior year values.

 

I know that it would be possible to do this with a MERGE statment but I want to understand how to use the RETAIN statement. Here is what I tried:

data pct_change;
     set pct_change;
     by _type_;
     retain applicant_sum admit_sum enrollment_sum;
run;

I've read about 4 or 5 step-by-step guides and I think the problem is that I am not using an INPUT statement or an assignment statement. I don't understand how to use an INPUT statement in this situation, but I did try to use an assignment statement, as follows:

data pct_change;
     set pct_change;
     by _type_;

     applicant_sum_new = applicant_sum;
     admit_sum_new = admit_sum;
     enrollment_sum_new = enrollment_sum;

     retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;

Now I have performed multiple assignments within the data step. Per SAS documentation, "RETAIN: Causes a variable that is created by an INPUT or assignment statement to retain its value from one iteration of the DATA step to the next.

 

If I just created these variables with an assignment step, why does RETAIN not work like I expect? Can someone please explain how to properly use RETAIN to get values from both observations into a single observation?

 

Thanks so much!!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

@techsassy wrote:

If the last part of your response is true, then why does RETAIN not work when I "newly create" the variables applicant_sum_new, admit_sum_new, and enrollment_sum_new in the last half of my original post? This is where I'm really getting confused.


Because you always replaced the variables with current values before the implicit OUTPUT that happens at the end of the data step.

See my little correction and test what it does:

data pct_change;
     set pct_change;
     by _type_;

     output;

     applicant_sum_new = applicant_sum;
     admit_sum_new = admit_sum;
     enrollment_sum_new = enrollment_sum;

     retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;

View solution in original post

13 REPLIES 13
RW9
Diamond | Level 26 RW9
Diamond | Level 26

No need to do any of that.  Just merge one dataset to the other:

data pct_change;
  merge tabulate prior_tabulate;
  by _type_;
run;
techsassy
Obsidian | Level 7
Thanks, RW9, this is a solution that I may settle on but for now I would really like to understand how to use the retain statement and why it is not behaving the way I expected it to.
Kurt_Bremser
Super User

@techsassy wrote:
Thanks, RW9, this is a solution that I may settle on but for now I would really like to understand how to use the retain statement and why it is not behaving the way I expected it to.

RETAINING a variable that is part of the input dataset is useless, as the value will always be replaced with the value of the dataset when the next observation is read. RETAIN only works with variables that are newly created in the data step.

techsassy
Obsidian | Level 7

If the last part of your response is true, then why does RETAIN not work when I "newly create" the variables applicant_sum_new, admit_sum_new, and enrollment_sum_new in the last half of my original post? This is where I'm really getting confused.

Kurt_Bremser
Super User

@techsassy wrote:

If the last part of your response is true, then why does RETAIN not work when I "newly create" the variables applicant_sum_new, admit_sum_new, and enrollment_sum_new in the last half of my original post? This is where I'm really getting confused.


Because you always replaced the variables with current values before the implicit OUTPUT that happens at the end of the data step.

See my little correction and test what it does:

data pct_change;
     set pct_change;
     by _type_;

     output;

     applicant_sum_new = applicant_sum;
     admit_sum_new = admit_sum;
     enrollment_sum_new = enrollment_sum;

     retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;
FreelanceReinh
Jade | Level 19

Alternatively, you could perform the assignments conditionally to avoid overwriting:

data pct_change;
     set pct_change;
     by _type_;
     if _n_=1 then do;  /* <---- */
       applicant_sum_new = applicant_sum;
       admit_sum_new = admit_sum;
       enrollment_sum_new = enrollment_sum;
     end;               /* <---- */
     retain applicant_sum_new admit_sum_new enrollment_sum_new;
run;

Also, the documentation says:

"It is redundant to name any of these items in a RETAIN statement, because their values are automatically retained from one iteration of the DATA step to the next:

  • variables that are read with a SET, MERGE, MODIFY or UPDATE statement
  • (...)"

This can be demonstrated by the following data step:

data pct_change;
if _n_=1 then set tabulate;
else set prior_tabulate;
run;

Actually I think you don't want to have two observations, but only one containing all variables. So, you could make use of the fact that within one iteration of the data step no RETAIN is necessary and simply use @Astounding's suggestion using the double SET statement.

Loko
Barite | Level 11

Hello,

 

In

data pct_change;
     set pct_change;
     by _type_;
     retain applicant_sum admit_sum enrollment_sum;
run;

you are telling SAS - retain my variables applicant_sum , admit_sum, enrollment_sum.

 

SAS does what you tell it but the variables have the same name on pct_change database so it overwrites them

when reading observations into PDV.

 

Basically your retain statement does nothing...

 

Consider the following aprroach (code not tested):

 

data want;

set pct_change;

by _type_;

 

retain applicant_sum_r admit_sum_r enrollment_sum_r;

 

if missing(applicant_sum)=0 then applicant_sum_r=applicant_sum;

if missing(admit_sum_r)=0 then admit_sum_r_r=admit_sum_r;

if missing(enrollment_sum_r)=0 then enrollment_sum_r_r=enrollment_sum_r;

 

if last._type_ then output;

 

run;

 

 

techsassy
Obsidian | Level 7

Hi Loko, thanks for your response. 

 

When you say, "the variables have the same name on pct_change database so it overwrites them when reading observations into PDV", do you mean it is happening because of the following code?

data pct_change;
     set pct_change;

Even if I change it to the following, it still doesn't work:

data pct_change_new;
     set pct_change;

Since my variables are called applicant_sum and prior_applicant_sum, admit_sum and prior_admit_sum, enrollment_sum and prior_enrollment_sum, I'm not sure what else you could mean by "the variables have the same name on pct_change database".

 

I did try the code that you provided and was able to tweak it and make it work.

 

More importantly, however, I'm afraid that I still don't really understand why retain isn't working the way I think it should. Can you elaborate on what you mean by "the variables have the same name on pct_change database so it overwrites them"?

 

Thanks!!

Loko
Barite | Level 11

Consider the following example:

data have;

x=1;output;

x=2;output;

x=3;output;

run;

 

data want;

set have;

retain x;

run;

 

what happens in sas is:

Reading the first observation --> x=1

 

Reading the second observation before set statement x=1 (which happens because of the set statement, retain statement being useless in my example, is only to mimic your situation)

Reading the second observation after set statement x=2

 

Reading the third observation before set statement x=2

Reading the third observation after set statement x=3

 

So because the variable have the same name the retain does nothing.

 

Instead consider the following example:

data have;

x=1;output;

x=2;output;

x=3;output;

run;

 

data want;

set have;

retain y;

 

y=x;

run;

 

what happens in sas is:

Reading the first observation --> x=1, y=1

 

Reading the second observation before set statement x=1 (because of the set statement), y=1 (because of retain statement)

Reading the second observation after set statement and before y=x assignment x=2, y=1

Reading the second observation after set statement and after y=x assignment x=2, y=2

 

Reading the third observation before set statement x=2 (because of the set statement), y=2 (because of retain statement)

Reading the third observation after set statement and before y=x assignment x=3, y=2

Reading the third observation after set statement and after y=x assignment x=3, y=3

 

 

 

Kurt_Bremser
Super User

So you want your values from observation 1 to "spill over" into observation 2?

 

Aside from a simple MERGE being the best option for your problem, try

data pct_change;
set pct_change;
x1 = lag(applicant_sum);
x1 = lag(admit_sum);
x3 = lag(enrollment_sum);
if _n_ = 2
then do;
  applicant_sum = x1;
  admit_sum = x2;
  enrollment_sum = x3;
end;
drop x1 x2 x3;
run;

 

 

techsassy
Obsidian | Level 7
Thank you for the response and example, Kurt. Why is LAG a better option than RETAIN? Also, is RETAIN an inappropriate option, or just not a preferred option?
Kurt_Bremser
Super User

No, you can also use a RETAIN, like

data pct_change;
set pct_change;
retain x1 x2 x3;
if _n_ = 2
then do;
  applicant_sum = x1;
  admit_sum = x2;
  enrollment_sum = x3;
end;
x1 = applicant_sum;
x2 = admit_sum;
x3 = enrollment_sum;
drop x1 x2 x3;
run;

See how I changed the order of statements because of the way that the LAG function works.

 

Astounding
PROC Star

Another simple version:

 

data want;

set tabulate;

set prior_tabulate;

run;

 

To add to the many accurate observations that have already been made here ...

 

All variables from SAS data sets are automatically retained.  The RETAIN statement that you have actually does nothing.  You can see this for yourself by adding some diagnostic code to your program:

 

data test;

put _all_;

set pct_change;

put _all_;

run;

 

As has been mentioned, the SET statement brings in another observation.  That new observation overwrites what was in place before, so you don't see the impact of RETAIN after the SET statement has replaced any retained values.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 6956 views
  • 12 likes
  • 6 in conversation