BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
POOJA_J
Obsidian | Level 7

For practice I was using SUM statement and tried to check if I get same answer using RETAIN statement. To my surprise I get missing values using the RETAIN statement. 

Till now whatever I learned, I assumed either can be used. 

Now I'm confused in what situation should I use either of them? 

POOJA_J_0-1695729583607.pngPOOJA_J_1-1695729603009.pngPOOJA_J_2-1695729622803.png

 

POOJA_J_3-1695729646256.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Look at your log.

 

You are RETAINing a variable (total) initialized to missing.  Therefore the statement

 

total=total+salary;

will generate an error message, because arithmetic operators will return missing from missing.

 

You could try intializing total to zero, which would do what you expect as long as SALARY is not missing.

retain TOTAL 0;

The summing statement implies retain and also operates almost like the SUM function.   Which is to say that missing values are ignored.  This function (like MEAN, STD, and all other stat functions) will skip missing values in the argument list.

 

I say the statement is "almost" like the SUM function, because the SUM of two missings is missing, but the sum statement (see observation 1 below) generates a zero.  I haven't looked for it , but I think it is documented that the SUM statement acts as if the new (retained) variable is initialized as 0.

 

data have;
  do i=.,1,2; output; end;
run;
data _null_;
  set have;
  retain sumfunc ;
  sumfunc=sum(sumfunc,i);
  sumstatement+i;
  put (_all_) (=);
run;

which generates the log

62   data _null_;
63     set have;
64     retain sumfunc ;
65     sumfunc=sum(sumfunc,i);
66     sumstatement+i;
67     put (_all_) (=);
68   run;

i=. sumfunc=. sumstatement=0
i=1 sumfunc=1 sumstatement=1
i=2 sumfunc=3 sumstatement=3
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 65:11
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

The SUM STATEMENT does two things and RETAIN is just one of them.

 

You need to use the SUM() function to handle any missing values.

total=sum(total,salary);

Also you need to initialize total to zero to mimic what the SUM statement will do.

retain total 0;
mkeintz
PROC Star

Look at your log.

 

You are RETAINing a variable (total) initialized to missing.  Therefore the statement

 

total=total+salary;

will generate an error message, because arithmetic operators will return missing from missing.

 

You could try intializing total to zero, which would do what you expect as long as SALARY is not missing.

retain TOTAL 0;

The summing statement implies retain and also operates almost like the SUM function.   Which is to say that missing values are ignored.  This function (like MEAN, STD, and all other stat functions) will skip missing values in the argument list.

 

I say the statement is "almost" like the SUM function, because the SUM of two missings is missing, but the sum statement (see observation 1 below) generates a zero.  I haven't looked for it , but I think it is documented that the SUM statement acts as if the new (retained) variable is initialized as 0.

 

data have;
  do i=.,1,2; output; end;
run;
data _null_;
  set have;
  retain sumfunc ;
  sumfunc=sum(sumfunc,i);
  sumstatement+i;
  put (_all_) (=);
run;

which generates the log

62   data _null_;
63     set have;
64     retain sumfunc ;
65     sumfunc=sum(sumfunc,i);
66     sumstatement+i;
67     put (_all_) (=);
68   run;

i=. sumfunc=. sumstatement=0
i=1 sumfunc=1 sumstatement=1
i=2 sumfunc=3 sumstatement=3
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 65:11
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

Maxim 1: Read the Documentation.

From SUM Statement :

The variable is automatically set to 0 before SAS reads the first observation.

and

SAS treats an expression that produces a missing value as zero.

Both of these are not done if you use RETAIN without initialization, and a simple calculation with + in the assignment.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 382 views
  • 3 likes
  • 4 in conversation