BookmarkSubscribeRSS Feed
TashaChapman14
Obsidian | Level 7

A friend of mine brought this to my attention and I was quite surprised by it.  As noted in the link below, "In a comparison, missing numeric values are lower than any other numeric value." 

 

https://documentation.sas.com/?docsetId=basess&docsetTarget=n0rpmy8a96esqwn1a93fb7vs06nm.htm&docsetV...

 

In other words, if you had code such as "If TotalCost lt 2000" you would get all values below 2000, including all observations with missing values for TotalCost.  Huh?

 

This seems odd to me, and also doesn't seem consistent with the rest of SAS programming.  For example, this isn't how PROC Format treats comparisons. This also isn't consistent with other programming languages, such as R.


Thoughts?  Why is it like this?  Am I the only one that thinks this is weird?

16 REPLIES 16
unison
Lapis Lazuli | Level 10

This seems odd to me, and also doesn't seem consistent with the rest of SAS programming.

What do you mean by consistency? The treatment of missing as low vs. high or the inclusion/exclusion of missing by default?

 

-unison

-unison
TashaChapman14
Obsidian | Level 7
I think I mostly meant the inclusion/exclusion of missing by default. I expected missing data to be excluded or treated as distinctly different (below I give a few examples). That is how PROC FORMAT treats missing data, that is how every statistics and analytics procedure (to my immediate recollection) treats missing data... This really surprised me.
unison
Lapis Lazuli | Level 10

I expect that the "low" treatment of missing by default was chosen so that these observations appear on top when sorted ascending so that the programmer notices them and can handle them as he/she wishes.

 

In terms of "inclusion/exclusion by default" in if/then processing are you saying that you'd expect SAS:

  • to skip over (i.e. exclude) missing observations when processing your command(s) or
  • to delete them altogether

-unison

-unison
SASKiwi
PROC Star

This isn't weird at all. There is no standard across different software products as to how missing or NULL values are treated. Each product has its own rules. As long as these are clearly documented then I don't see a problem.

 

I know some products that treat NULL values as the highest possible value by default like Oracle. A rule like this is necessary so that data can be predictably sorted. By SAS enforcing a missing value as the lowest possible value, then these will always sort first and that seems quite logical to me.

Reeza
Super User

It's an arbitrary decision similar to should arrays be indexed from 0/1. 

You can choose to exclude missing values, or include them as higher or lower than everything else. In this case SAS went with lower. 

 

SAS is a bit older than R, it started in 1966 to 1976 while S started in 1976 and then R was started in 1990 as a open source spin off from S. 

 

Personally, I'm okay with it being the lowest because if you then sort and view it, it's the first in the data set so you remember to deal with it, whereas otherwise if it's at the end it's easier to forget about it.

 

 

 

 

ballardw
Super User

@TashaChapman14 wrote:

A friend of mine brought this to my attention and I was quite surprised by it.  As noted in the link below, "In a comparison, missing numeric values are lower than any other numeric value." 

 

https://documentation.sas.com/?docsetId=basess&docsetTarget=n0rpmy8a96esqwn1a93fb7vs06nm.htm&docsetV...

 

In other words, if you had code such as "If TotalCost lt 2000" you would get all values below 2000, including all observations with missing values for TotalCost.  Huh?

 

This seems odd to me, and also doesn't seem consistent with the rest of SAS programming.  For example, this isn't how PROC Format treats comparisons.


Can you provide an example of how Proc Format doesn't do this? Provide a format and values used for testing.

 

It has been ages since I did any FORTRAN (a programming language around since 1957) coding on a regular basis but I seem to recall the program FAILING if arithmetic or logic was attempted with a variable that did not have any value (non-missing) assigned. And I think COBOL (again around since the 50's) behaves similarly unless specific complier directives are used. So be grateful that the programming language will allow you to use undefined values.

TashaChapman14
Obsidian | Level 7

Here's a perfect example of how PROC Format doesn't follow these rules.  If you run the code below (which includes the dataset in the code), you'll see that the second observation with the missing testscore is always treated as missing data, not "lower than the lowest numeric value."  It is never coded as a "Failing grade".  It is always distinctly different.

 

However, the last data step and proc print show how the comparison operators in the DATA step use different logic, and thus code the value as a failing grade. 

 

If I'd had my druthers, it would have kept it as a missing value.  This is how SAS treats any other assignment statement equation.  If I said "Grade = Testscore * 22", if testscore was missing then grade would have been missing.  Seems reasonable to me.

 

I seem to be in the minority here, but I'll die on this hill.  🙂 

 

 

 

data test;
infile datalines missover;
input id $ testscore;
datalines;
1 98
2 
3 27
4 52
5 68
6 88
7 92
8 78
;

proc format;
value grades
low-59 = 'Failing grade'
60-high = 'Passing grade';
run;

data applyfmt; set test;
format testscore grades.;
run;

proc print data=applyfmt;
run;

proc freq data=applyfmt;
table testscore;
run;

proc freq data=test;
format testscore grades.;
table testscore;
run;

data applycomp; set test;
if testscore le 59 then Grade = 'Failing Grade';
else if testscore gt 59 then Grade = 'Passing Grade';
run;

proc print data=applycomp;
run;
ballardw
Super User
data applycomp; 
   set test;
   if constant('small') le testscore le 59 then Grade = 'Failing Grade';
   else if testscore gt 59 then Grade = 'Passing Grade';
run;

Is basically the equivalent of the Format used.

Tom
Super User Tom
Super User

Not any stranger than other methods of dealing with missing data.  For example many database systems treat missing values are neither smaller nor greater.. That can yield really strange results if you are not expecting it.  For example if you count those less 5 and those 5 or greater the sum of the two counts might not be the same as the total count.  Or worse the equality operator doesn't work. Missing values In those systems not equal.  You cannot join on them compare them.  

 

Note that SAS actually has 28 missing values. And the inequality (and equality) operators work on all of them.

._ < . < .A < ... < .Z

 

 

TashaChapman14
Obsidian | Level 7
That last bit.... Mind blown...
Reeza
Super User

I concur that PROC FORMAT may be inconsistent to some degree because the documentation says this for the range:

In numeric ranges, LOW includes the lowest numeric value, excluding missing values. PROC FREQ is working correctly but PROC PRINT seems to apply it as if it is included....You should always deal with Missing values by having MISSING in your formats is the lesson then...I always have an OTHER coded to CHECKME for these types of cases.

https://documentation.sas.com/?docsetId=proc&docsetTarget=n03qskwoints2an1ispy57plwrn9.htm&docsetVer...

Yet it's displayed as Failing Grade in PROC PRINT.

TashaChapman14
Obsidian | Level 7
Quick check, are you looking at the PROC PRINT of APPLYFMT (which should show the formatted value of TESTSCORE as missing) or APPLYCOMP (which shows the value of GRADE as "failing grade")?
Reeza
Super User
Right...ok, then SAS is doing it correctly, according to the documentation or at least then the exception here is a documented difference that is noted. Proc Format has the MISSING to deal with that.

Try repeating your example with a character variable now 🙂
Because that does include missing in the range.

I concur that this may be confusing, but it is designed that way and part of me thinks if you're using PROC FORMAT to categorize your data accepting that you explicitly need to deal with missing data is fine.
TashaChapman14
Obsidian | Level 7
Yes, I do eat, sleep, and breathe PROC Format, which may have influenced my expectations with how the DATA step would handle missing data.

In the end, I'm not saying it's wrong. It's just not intuitive.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 3919 views
  • 8 likes
  • 6 in conversation