turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Confused by how sas handles missing values

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2016 08:37 AM - edited 07-06-2016 08:38 AM

I'm confused about the way SAS handles missing values. I've recently realized I have to be very careful about assuming what SAS will do when it encounters a missing value. Here is a simple example.

data _null_; a = .; b = a + 1; c = sum(a,1); a+1; put a= b= c=; run;

The result is: a=1 b=. c=1

This means that adding 1 to missing with + results in missing, but adding 1 to missing with either the sum function or increment operator results in 1. Is there any logical reason for this behavior?

Thanks!

Accepted Solutions

Solution

07-06-2016
12:44 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Adam_Black

07-06-2016 11:34 AM

Why the behaviors:

Some times you do not want to include the value but need the sum of non-missing so that is the behavior of the SUM function. For instance I have variables that represent sales of different products but not every customer buys every product. If I want to know how much a customer spent I want to total all non-missing product sales: SUM Function.

For some things I may require that I calculate the sum only when every value is populated. Is if fair to compare totals of quarterly sales for a salesman that only worked 1 quarter of the year and not 4? So I use the addition so that I only have sums for salesmen that worked all 4 quarters (obviously I would do something else for the other salesmen).

Since the primary purpose of the A+1; syntax is a counter it makes a lot of sense that after the first increment a "missing" has one added to indicate the count was completed. NOTE that the syntax also implies a RETAIN statement for the variable A and behaves quite differently.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Adam_Black

07-06-2016 09:06 AM

Hello,

What you expect when saying logical behaviour ?

YOu can find documentation which explains each one of your examples:

**sum statement** : http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000289454.htm

**sum function**: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245953.htm

**sum operator**: http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001334675.htm

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Loko

07-06-2016 12:54 PM

Thanks for the documentation references. I was expecting all three of the the methods that add one to the variable 'a' to give me the same result. Also I mistakenly thought that the sum statement, "a+1;", is equivalent to "a = a +1;"

In fact,

the sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:

retain variable 0; variable=sum(variable,expression);

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Adam_Black

07-06-2016 09:18 AM

** SAS functions ignore missing values in their computation.** So with the SUM function, the sum of 2 and a missing value is 2, whether it is missing with the + operator: 2 + . = . If you read on support the SUM statement confirm you're argumentation. Here there's a clear paper on topic.

hope this helps

have nice day

Martino Crippa

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Adam_Black

07-06-2016 10:16 AM

The documents mentioned explain the difference between SUM FUNCTION and SUM STATEMENT.

A simple way is to look at the following two data steps.

The first data step works with SUM STATEMENT.

```
data _null_;
b = a + 1;
put b = ;
run;
```

When the data step starts, both A and B are assigned with missing values. Even if you insert a statement,( A = .; ) as a first line in the program, still a is missing but avoids a NOTE(Variable A is uninitialized).

So, in Sum statement, using a missing value results in a missing value. This is how SAS works. You can explore the use of RETAIN Statement further.

The second works with the SUM Function:

At the beginning of the data step both A and C are assigned with missing values. When the data step COMPILES, the use of SUM function is noted, and it makes A to take zero value when the data step starts execution( the RETAIN statement is used with zero, behind the scene). Hence, C becomes zero + 1. However, SAS writes a NOTE(Variable A is uninitialized).

Hence, SAS treats SUM Statement and SUM Function differently.

```
data _null_;
c = sum(a, 1);
put c = ;
run;
```

Solution

07-06-2016
12:44 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Adam_Black

07-06-2016 11:34 AM

Why the behaviors:

Some times you do not want to include the value but need the sum of non-missing so that is the behavior of the SUM function. For instance I have variables that represent sales of different products but not every customer buys every product. If I want to know how much a customer spent I want to total all non-missing product sales: SUM Function.

For some things I may require that I calculate the sum only when every value is populated. Is if fair to compare totals of quarterly sales for a salesman that only worked 1 quarter of the year and not 4? So I use the addition so that I only have sums for salesmen that worked all 4 quarters (obviously I would do something else for the other salesmen).

Since the primary purpose of the A+1; syntax is a counter it makes a lot of sense that after the first increment a "missing" has one added to indicate the count was completed. NOTE that the syntax also implies a RETAIN statement for the variable A and behaves quite differently.