I'm not sure if this is the correct board, if not can someone tell me where the appropriate board would be.
It has recently come to my attention when I use a negation statement missing/null values are unintentionally also removed from the data table. While for some projects this isn't a problem it is for others.
For example if I say where XYZ_IND ^= "E"
Both the "E" values and missing/null values are removed from my data table. In one instance missing/null values are VALID.
Is there something with my SAS setup or is it a quirk of SAS?
Thanks,
AggieGal
This does not sound like a SAS quirk, but an SQL quirk. Sounds like your data is stored on a non-SAS database. Most SQL databases use a half-hearted kind of three-pronged logic, where comparing NULL to something is kind of undefined (does not matter what operator you use); but "kind of undefined" comes out as "not true". So, even NULL=NULL comes out at "kind of not true", and you have to use the special operator "IS NULL".
SAS has a (I think) more reasonable approach, where a missing value is comparable, and a missing value is considered smaller than any non-missing value.
So if you want to get the result you want, you may have to change your where clause to
where XYZ_IND^="E" or XYZ_IND is null
A missing value is ^= "E"
perhaps you want
where xyz_ind^="E" and not missing(xyz_ind)
The problem is there are other indicators. It is just the "E"s that shouldn't be counted.
Ultimately I did an INCLUSIVE where statement to include blanks " " and missing since they are coded separately in the data.
@AggieGal wrote:
The problem is there are other indicators. It is just the "E"s that shouldn't be counted.
Ultimately I did an INCLUSIVE where statement to include blanks " " and missing since they are coded separately in the data.
You have given an extremely fragmentary description of the problem. So I'm not clear what you are saying or what you did.
How is it a "quirk"? If you have the character string "E" and the character string "" which is missing, how are they the same? Therefore a clause which states is "E"="" is obviously false, and thus the record gets removed.
Edit: I just saw your latest post. If you want to do it like that then simply:
where not(xy_ind="E");
I.e. if xy_ind=E then don't output.
That's the thing they aren't the same by any stretch of the imagination.
When I tell SAS ^= "E" the missing/null values are ALSO being removed. I just need the E's removed.
I'm trying to figure out if it is an annoying quirk in SAS or if there is something with my SAS setup.
No its not a quirk, it is simple logic. E does not equal missing, nor does it equal null, nor does it equal A, b, 1, 2, =, or ! or any other character. Therefore you logic states, where something is not equal to E, means anything which is not E is dropped. There is nothing wrong with your setup, there is no quirk in SAS. It is just following what you have told it to do.
Still seems like my code in message #2 does what you want.
As far as the possibility that this is a quirk, I'd advise against that thought process, SAS is doing exactly what you asked it to do (and as far as I know, all programming languages would behave the same).
As far as I know when I say I want EVERYTHING but one thing, EVERYTHING but the one thing should be included in the data set.
Take the following
XYZ_IND can equal A, B, C, D, E, F, (etc), " " (bank), or missing/null.
When I tell SAS Where XYZ_IND ^= "E" I should get all the above BUT E. However the missing/null values are ALSO being removed.
This is where the hangup is. When I am doing a WHERE CLAUSE the missing/null are being removed in ADDITION to what I don't want. I'm trying to figure out is why this is happening.
Please share a small portion of your actual data, as SAS data step code, as indicated at this link:
Also please share your complete code for the data step or PROC where you want this WHERE clause to work.
Your expectations are right on target.
The portion of code that you presented looks fine, although this might be clearer to read:
if xyz_ind="E" then delete;
Perhaps the problem lies in some other portion of the code that you have not shared at this point. For example, some procedures automatically throw out missing values of variables that appear in a CLASS statement. But from everything you have posted so far, the results should match what you want (even though we know they do not).
This does not sound like a SAS quirk, but an SQL quirk. Sounds like your data is stored on a non-SAS database. Most SQL databases use a half-hearted kind of three-pronged logic, where comparing NULL to something is kind of undefined (does not matter what operator you use); but "kind of undefined" comes out as "not true". So, even NULL=NULL comes out at "kind of not true", and you have to use the special operator "IS NULL".
SAS has a (I think) more reasonable approach, where a missing value is comparable, and a missing value is considered smaller than any non-missing value.
So if you want to get the result you want, you may have to change your where clause to
where XYZ_IND^="E" or XYZ_IND is null
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.