I'm confused about the way SAS handles missing values. I've recently realized I have to be very careful about assuming what SAS will do when it encounters a missing value. Here is a simple example.
data _null_; a = .; b = a + 1; c = sum(a,1); a+1; put a= b= c=; run;
The result is: a=1 b=. c=1
This means that adding 1 to missing with + results in missing, but adding 1 to missing with either the sum function or increment operator results in 1. Is there any logical reason for this behavior?
Thanks!
Why the behaviors:
Some times you do not want to include the value but need the sum of non-missing so that is the behavior of the SUM function. For instance I have variables that represent sales of different products but not every customer buys every product. If I want to know how much a customer spent I want to total all non-missing product sales: SUM Function.
For some things I may require that I calculate the sum only when every value is populated. Is if fair to compare totals of quarterly sales for a salesman that only worked 1 quarter of the year and not 4? So I use the addition so that I only have sums for salesmen that worked all 4 quarters (obviously I would do something else for the other salesmen).
Since the primary purpose of the A+1; syntax is a counter it makes a lot of sense that after the first increment a "missing" has one added to indicate the count was completed. NOTE that the syntax also implies a RETAIN statement for the variable A and behaves quite differently.
Hello,
What you expect when saying logical behaviour ?
YOu can find documentation which explains each one of your examples:
sum statement : http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000289454.htm
sum function: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245953.htm
sum operator: http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001334675.htm
Thanks for the documentation references. I was expecting all three of the the methods that add one to the variable 'a' to give me the same result. Also I mistakenly thought that the sum statement, "a+1;", is equivalent to "a = a +1;"
In fact,
the sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:
retain variable 0; variable=sum(variable,expression);
Thanks.
SAS functions ignore missing values in their computation. So with the SUM function, the sum of 2 and a missing value is 2, whether it is missing with the + operator: 2 + . = . If you read on support the SUM statement confirm you're argumentation. Here there's a clear paper on topic.
hope this helps
have nice day
The documents mentioned explain the difference between SUM FUNCTION and SUM STATEMENT.
A simple way is to look at the following two data steps.
The first data step works with SUM STATEMENT.
data _null_;
b = a + 1;
put b = ;
run;
When the data step starts, both A and B are assigned with missing values. Even if you insert a statement,( A = .; ) as a first line in the program, still a is missing but avoids a NOTE(Variable A is uninitialized).
So, in Sum statement, using a missing value results in a missing value. This is how SAS works. You can explore the use of RETAIN Statement further.
The second works with the SUM Function:
At the beginning of the data step both A and C are assigned with missing values. When the data step COMPILES, the use of SUM function is noted, and it makes A to take zero value when the data step starts execution( the RETAIN statement is used with zero, behind the scene). Hence, C becomes zero + 1. However, SAS writes a NOTE(Variable A is uninitialized).
Hence, SAS treats SUM Statement and SUM Function differently.
data _null_;
c = sum(a, 1);
put c = ;
run;
Why the behaviors:
Some times you do not want to include the value but need the sum of non-missing so that is the behavior of the SUM function. For instance I have variables that represent sales of different products but not every customer buys every product. If I want to know how much a customer spent I want to total all non-missing product sales: SUM Function.
For some things I may require that I calculate the sum only when every value is populated. Is if fair to compare totals of quarterly sales for a salesman that only worked 1 quarter of the year and not 4? So I use the addition so that I only have sums for salesmen that worked all 4 quarters (obviously I would do something else for the other salesmen).
Since the primary purpose of the A+1; syntax is a counter it makes a lot of sense that after the first increment a "missing" has one added to indicate the count was completed. NOTE that the syntax also implies a RETAIN statement for the variable A and behaves quite differently.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.