DATA Step, Macro, Functions and more

Confused by how sas handles missing values

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Confused by how sas handles missing values

[ Edited ]

I'm confused about the way SAS handles missing values. I've recently realized I have to be very careful about assuming what SAS will do when it encounters a missing value. Here is a simple example.

data _null_;
a = .;
b = a + 1;
c = sum(a,1);
a+1;

put a= b= c=;
run;

The result is: a=1 b=. c=1

This means that adding 1 to missing with + results in missing, but adding 1 to missing with either the sum function or increment operator results in 1. Is there any logical reason for this behavior?

Thanks!


Accepted Solutions
Solution
‎07-06-2016 12:44 PM
Super User
Posts: 10,497

Re: Confused by how sas handles missing values

Why the behaviors:

 

Some times you do not want to include the value but need the sum of non-missing so that is the behavior of the SUM function. For instance I have variables that represent sales of different products but not every customer buys every product. If I want to know how much a customer spent I want to total all non-missing product sales: SUM Function.

 

For some things I may require that I calculate the sum only when every value is populated. Is if fair to compare totals of quarterly sales for a salesman that only worked 1 quarter of the year and not 4? So I use the addition so that I only have sums for salesmen that worked all 4 quarters (obviously I would do something else for the other salesmen).

 

Since the primary purpose of the A+1; syntax is a counter it makes a lot of sense that after the first increment a "missing" has one added to indicate the count was completed. NOTE that the syntax also implies a RETAIN statement for the variable A and behaves quite differently.

View solution in original post


All Replies
Super Contributor
Posts: 305

Re: Confused by how sas handles missing values

Occasional Contributor
Posts: 13

Re: Confused by how sas handles missing values

Thanks for the documentation references. I was expecting all three of the the methods that add one to the variable 'a' to give me the same result. Also I mistakenly thought that the sum statement, "a+1;",  is equivalent to "a = a +1;"

 

In fact,

 

the sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:

retain variable 0;
variable=sum(variable,expression);

 

Thanks.

Contributor
Posts: 34

Re: Confused by how sas handles missing values

SAS functions ignore missing values in their computation. So with the SUM function, the sum of 2 and a missing value is 2, whether it is missing with the + operator: 2 + . = . If you read on support the SUM statement confirm you're argumentation. Here there's a clear paper on topic.

hope this helps

have nice day

Martino Crippa
Super Contributor
Posts: 254

Re: Confused by how sas handles missing values

The documents mentioned explain the difference between SUM FUNCTION and SUM STATEMENT.

 

A simple way is to look at the following two data steps.

 

The first data step works with SUM STATEMENT.

data _null_;
   b = a + 1;
   put b = ;
run;

 

When the data step starts, both A and B are assigned with missing values. Even if you insert a statement,( A = .; ) as a first line in the program, still a is missing but avoids a NOTE(Variable A is uninitialized).

So, in Sum statement, using a missing value results in a missing value. This is how SAS works. You can explore the use of RETAIN Statement further.

 

The second works with the SUM Function:

At the beginning of the data step both A and C are assigned with missing values. When the data step COMPILES, the use of SUM function is noted, and it makes A to take zero value when the data step starts execution( the RETAIN statement is used with zero, behind the scene). Hence, C becomes zero + 1. However, SAS writes a NOTE(Variable A is uninitialized).

 

Hence, SAS treats SUM Statement and SUM Function differently.

 

 

data _null_;
   c = sum(a, 1);
   put c = ;
run;
Solution
‎07-06-2016 12:44 PM
Super User
Posts: 10,497

Re: Confused by how sas handles missing values

Why the behaviors:

 

Some times you do not want to include the value but need the sum of non-missing so that is the behavior of the SUM function. For instance I have variables that represent sales of different products but not every customer buys every product. If I want to know how much a customer spent I want to total all non-missing product sales: SUM Function.

 

For some things I may require that I calculate the sum only when every value is populated. Is if fair to compare totals of quarterly sales for a salesman that only worked 1 quarter of the year and not 4? So I use the addition so that I only have sums for salesmen that worked all 4 quarters (obviously I would do something else for the other salesmen).

 

Since the primary purpose of the A+1; syntax is a counter it makes a lot of sense that after the first increment a "missing" has one added to indicate the count was completed. NOTE that the syntax also implies a RETAIN statement for the variable A and behaves quite differently.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 374 views
  • 2 likes
  • 5 in conversation