For practice I was using SUM statement and tried to check if I get same answer using RETAIN statement. To my surprise I get missing values using the RETAIN statement.
Till now whatever I learned, I assumed either can be used.
Now I'm confused in what situation should I use either of them?
Look at your log.
You are RETAINing a variable (total) initialized to missing. Therefore the statement
total=total+salary;
will generate an error message, because arithmetic operators will return missing from missing.
You could try intializing total to zero, which would do what you expect as long as SALARY is not missing.
retain TOTAL 0;
The summing statement implies retain and also operates almost like the SUM function. Which is to say that missing values are ignored. This function (like MEAN, STD, and all other stat functions) will skip missing values in the argument list.
I say the statement is "almost" like the SUM function, because the SUM of two missings is missing, but the sum statement (see observation 1 below) generates a zero. I haven't looked for it , but I think it is documented that the SUM statement acts as if the new (retained) variable is initialized as 0.
data have;
do i=.,1,2; output; end;
run;
data _null_;
set have;
retain sumfunc ;
sumfunc=sum(sumfunc,i);
sumstatement+i;
put (_all_) (=);
run;
which generates the log
62 data _null_;
63 set have;
64 retain sumfunc ;
65 sumfunc=sum(sumfunc,i);
66 sumstatement+i;
67 put (_all_) (=);
68 run;
i=. sumfunc=. sumstatement=0
i=1 sumfunc=1 sumstatement=1
i=2 sumfunc=3 sumstatement=3
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 65:11
The SUM STATEMENT does two things and RETAIN is just one of them.
You need to use the SUM() function to handle any missing values.
total=sum(total,salary);
Also you need to initialize total to zero to mimic what the SUM statement will do.
retain total 0;
Look at your log.
You are RETAINing a variable (total) initialized to missing. Therefore the statement
total=total+salary;
will generate an error message, because arithmetic operators will return missing from missing.
You could try intializing total to zero, which would do what you expect as long as SALARY is not missing.
retain TOTAL 0;
The summing statement implies retain and also operates almost like the SUM function. Which is to say that missing values are ignored. This function (like MEAN, STD, and all other stat functions) will skip missing values in the argument list.
I say the statement is "almost" like the SUM function, because the SUM of two missings is missing, but the sum statement (see observation 1 below) generates a zero. I haven't looked for it , but I think it is documented that the SUM statement acts as if the new (retained) variable is initialized as 0.
data have;
do i=.,1,2; output; end;
run;
data _null_;
set have;
retain sumfunc ;
sumfunc=sum(sumfunc,i);
sumstatement+i;
put (_all_) (=);
run;
which generates the log
62 data _null_;
63 set have;
64 retain sumfunc ;
65 sumfunc=sum(sumfunc,i);
66 sumstatement+i;
67 put (_all_) (=);
68 run;
i=. sumfunc=. sumstatement=0
i=1 sumfunc=1 sumstatement=1
i=2 sumfunc=3 sumstatement=3
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 65:11
Maxim 1: Read the Documentation.
From SUM Statement :
The variable is automatically set to 0 before SAS reads the first observation.
and
SAS treats an expression that produces a missing value as zero.
Both of these are not done if you use RETAIN without initialization, and a simple calculation with + in the assignment.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.