SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Adam_Black
Obsidian | Level 7

I'm confused about the way SAS handles missing values. I've recently realized I have to be very careful about assuming what SAS will do when it encounters a missing value. Here is a simple example.

data _null_;
a = .;
b = a + 1;
c = sum(a,1);
a+1;

put a= b= c=;
run;

The result is: a=1 b=. c=1

This means that adding 1 to missing with + results in missing, but adding 1 to missing with either the sum function or increment operator results in 1. Is there any logical reason for this behavior?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Why the behaviors:

 

Some times you do not want to include the value but need the sum of non-missing so that is the behavior of the SUM function. For instance I have variables that represent sales of different products but not every customer buys every product. If I want to know how much a customer spent I want to total all non-missing product sales: SUM Function.

 

For some things I may require that I calculate the sum only when every value is populated. Is if fair to compare totals of quarterly sales for a salesman that only worked 1 quarter of the year and not 4? So I use the addition so that I only have sums for salesmen that worked all 4 quarters (obviously I would do something else for the other salesmen).

 

Since the primary purpose of the A+1; syntax is a counter it makes a lot of sense that after the first increment a "missing" has one added to indicate the count was completed. NOTE that the syntax also implies a RETAIN statement for the variable A and behaves quite differently.

View solution in original post

5 REPLIES 5
Adam_Black
Obsidian | Level 7

Thanks for the documentation references. I was expecting all three of the the methods that add one to the variable 'a' to give me the same result. Also I mistakenly thought that the sum statement, "a+1;",  is equivalent to "a = a +1;"

 

In fact,

 

the sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:

retain variable 0;
variable=sum(variable,expression);

 

Thanks.

MC1985
Obsidian | Level 7

SAS functions ignore missing values in their computation. So with the SUM function, the sum of 2 and a missing value is 2, whether it is missing with the + operator: 2 + . = . If you read on support the SUM statement confirm you're argumentation. Here there's a clear paper on topic.

hope this helps

have nice day

Martino Crippa
KachiM
Rhodochrosite | Level 12

The documents mentioned explain the difference between SUM FUNCTION and SUM STATEMENT.

 

A simple way is to look at the following two data steps.

 

The first data step works with SUM STATEMENT.

data _null_;
   b = a + 1;
   put b = ;
run;

 

When the data step starts, both A and B are assigned with missing values. Even if you insert a statement,( A = .; ) as a first line in the program, still a is missing but avoids a NOTE(Variable A is uninitialized).

So, in Sum statement, using a missing value results in a missing value. This is how SAS works. You can explore the use of RETAIN Statement further.

 

The second works with the SUM Function:

At the beginning of the data step both A and C are assigned with missing values. When the data step COMPILES, the use of SUM function is noted, and it makes A to take zero value when the data step starts execution( the RETAIN statement is used with zero, behind the scene). Hence, C becomes zero + 1. However, SAS writes a NOTE(Variable A is uninitialized).

 

Hence, SAS treats SUM Statement and SUM Function differently.

 

 

data _null_;
   c = sum(a, 1);
   put c = ;
run;
ballardw
Super User

Why the behaviors:

 

Some times you do not want to include the value but need the sum of non-missing so that is the behavior of the SUM function. For instance I have variables that represent sales of different products but not every customer buys every product. If I want to know how much a customer spent I want to total all non-missing product sales: SUM Function.

 

For some things I may require that I calculate the sum only when every value is populated. Is if fair to compare totals of quarterly sales for a salesman that only worked 1 quarter of the year and not 4? So I use the addition so that I only have sums for salesmen that worked all 4 quarters (obviously I would do something else for the other salesmen).

 

Since the primary purpose of the A+1; syntax is a counter it makes a lot of sense that after the first increment a "missing" has one added to indicate the count was completed. NOTE that the syntax also implies a RETAIN statement for the variable A and behaves quite differently.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 5085 views
  • 2 likes
  • 5 in conversation