- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm confused about the way SAS handles missing values. I've recently realized I have to be very careful about assuming what SAS will do when it encounters a missing value. Here is a simple example.
data _null_; a = .; b = a + 1; c = sum(a,1); a+1; put a= b= c=; run;
The result is: a=1 b=. c=1
This means that adding 1 to missing with + results in missing, but adding 1 to missing with either the sum function or increment operator results in 1. Is there any logical reason for this behavior?
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why the behaviors:
Some times you do not want to include the value but need the sum of non-missing so that is the behavior of the SUM function. For instance I have variables that represent sales of different products but not every customer buys every product. If I want to know how much a customer spent I want to total all non-missing product sales: SUM Function.
For some things I may require that I calculate the sum only when every value is populated. Is if fair to compare totals of quarterly sales for a salesman that only worked 1 quarter of the year and not 4? So I use the addition so that I only have sums for salesmen that worked all 4 quarters (obviously I would do something else for the other salesmen).
Since the primary purpose of the A+1; syntax is a counter it makes a lot of sense that after the first increment a "missing" has one added to indicate the count was completed. NOTE that the syntax also implies a RETAIN statement for the variable A and behaves quite differently.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
What you expect when saying logical behaviour ?
YOu can find documentation which explains each one of your examples:
sum statement : http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000289454.htm
sum function: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245953.htm
sum operator: http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001334675.htm
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the documentation references. I was expecting all three of the the methods that add one to the variable 'a' to give me the same result. Also I mistakenly thought that the sum statement, "a+1;", is equivalent to "a = a +1;"
In fact,
the sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:
retain variable 0; variable=sum(variable,expression);
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS functions ignore missing values in their computation. So with the SUM function, the sum of 2 and a missing value is 2, whether it is missing with the + operator: 2 + . = . If you read on support the SUM statement confirm you're argumentation. Here there's a clear paper on topic.
hope this helps
have nice day
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The documents mentioned explain the difference between SUM FUNCTION and SUM STATEMENT.
A simple way is to look at the following two data steps.
The first data step works with SUM STATEMENT.
data _null_;
b = a + 1;
put b = ;
run;
When the data step starts, both A and B are assigned with missing values. Even if you insert a statement,( A = .; ) as a first line in the program, still a is missing but avoids a NOTE(Variable A is uninitialized).
So, in Sum statement, using a missing value results in a missing value. This is how SAS works. You can explore the use of RETAIN Statement further.
The second works with the SUM Function:
At the beginning of the data step both A and C are assigned with missing values. When the data step COMPILES, the use of SUM function is noted, and it makes A to take zero value when the data step starts execution( the RETAIN statement is used with zero, behind the scene). Hence, C becomes zero + 1. However, SAS writes a NOTE(Variable A is uninitialized).
Hence, SAS treats SUM Statement and SUM Function differently.
data _null_;
c = sum(a, 1);
put c = ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why the behaviors:
Some times you do not want to include the value but need the sum of non-missing so that is the behavior of the SUM function. For instance I have variables that represent sales of different products but not every customer buys every product. If I want to know how much a customer spent I want to total all non-missing product sales: SUM Function.
For some things I may require that I calculate the sum only when every value is populated. Is if fair to compare totals of quarterly sales for a salesman that only worked 1 quarter of the year and not 4? So I use the addition so that I only have sums for salesmen that worked all 4 quarters (obviously I would do something else for the other salesmen).
Since the primary purpose of the A+1; syntax is a counter it makes a lot of sense that after the first increment a "missing" has one added to indicate the count was completed. NOTE that the syntax also implies a RETAIN statement for the variable A and behaves quite differently.