This is roughly the DATA step I am running:
DATA work.dataset;
SET db.table;
WHERE StringVariable = 'LOAD' AND NumberVariable = 1;
RUN;
In db.table, NumberVariable takes the value of either 0 or -1. Running this DATA step, one would think that this would bring in 0 observations since there is no instance where NumberVariable=1. However, It does load all observations into work.dataset where NumberVariable= -1 but not where NumberVariable = 0, so it's as if the WHERE statement only looks at the absolute value of NumberVariable
Both Format and Informat of NumberVariable is "2." If I remove both the format and informat (i.e. so that running a PROC CONTENTS of this dataset shows blank spaces under the FORMAT and INFORMAT columns for NumberVariable), it still brings in observations where NumberVariable = -1.
HOWEVER, when I use an IF statement instead of the WHERE statement, so that it reads:
DATA work.dataset;
SET db.table;
IF StringVariable = 'LOAD' AND NumberVariable = 1;
RUN;
….then 0 observations are read into work.dataset, as one would think as there are no observations in db.table where NumberVariable =1, only those where it =0 or =-1
Does anyone know why this is?
Can you provide a simple example, including a dataset in a data step (like the following) where this occurs? When I run the following I don't get any records. Do you if you run it?
data table;
input StringVariable $ NumberVariable;
cards;
LOAD 0
LOAD 0
LOAD -1
LOAD 0
LIGHT -1
LOAD 0
LOAD -1
;
DATA work.dataset;
SET table;
WHERE StringVariable = 'LOAD' AND NumberVariable = 1;
RUN;
Thanks for your reply.
When I run your test code, I get the same results, 0 records in work.dataset.
I wonder if it has to do with how the data are stored on the database from which I am pulling the data? I believe our IT dept. stores these data on our servers using SQL. I am not terribly experienced with database management, so I don't want to say too much about unfamiliar waters.
Going back to my original example, if I PROC PRINT on the dataset in my work folder:
PROC PRINT DATA= work.dataset;
WHERE NumberVariable = 1;
RUN;
There were 0 observations read from the data set, according to my Log window.
However, if I PROC PRINT on the original dataset, db.table:
PROC PRINT DATA= db.table;
WHERE NumberVariable = 1;
RUN;
Then all observations in db.table are read, even though for every observation in the Ouptut window, NumberVariable is -1
Since the data are coming from SQL Server, one possibility is that they are actually boolean values. In SQL Server true is represented as -1, where in SAS it is +1. Are you using an older version of SAS (e.g., ver 8)? I ask because there is a hotfix to correct for an optimizer problem related to that specific issue. It was supposed to be corrected in ver 9.
This is entirely possible. So far in other SQL server datasets at my company that I have encountered in SAS, binary variables are also either 0 or -1. I'm using SAS 9.2
Is SAS reading "WHERE NumberVariable=1" as "bring in observations from the SQL server where NumberVariable is true"? But then when the data are stored in a SAS dataset in my work folder, (in work.dataset), what was "true" is now stored as the number "-1" and SAS treats the variable NumberVariable in work.dataset as a number variable and not a boolean, so that -1 and 0 are disassociated from their orignial "true" and "false" meanings?
Steve, If that is indeed the issue, there is a saving grace. SAS treats not zero as true. Thus you could get around it by specifying the logic as boolean. For example:
data table;
format NumberVariable 2.;
informat NumberVariable 2.;
input StringVariable $ NumberVariable;
cards;
LOAD 0
LOAD 0
LOAD -1
LOAD 0
LIGHT -1
LOAD 0
LOAD -1
;
DATA work.dataset;
SET table;
WHERE StringVariable = 'LOAD' AND NumberVariable;
RUN;
Excellent, I imagine that this is the issue, and I will also consult with our DB admins.
Thank you very much for your assistance!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.