I wanted to subset data from a data set in an Oracle data base. "Where" and "If" clauses then gave me different results. Please see below:
data a; set odsstu.registration_course;
WHERE grade_type not in ('A','X');
run;
data b; set odsstu.registration_course;
IF grade_type not in ('A','X');
run;
In data set a, I got only records whose grade_type was not blank and whose grade_type was not A/X. In data set b, I got records whose grade_type not A/X, including blank grade_type. That means data set b had more records than data set a.
I know that WHERE applies before the data get to the PDV, and IF applies after the data get to the PDV. However, I still cannot figure out why WHERE and IF gave me different results as described above.
The next step, I figured out that I needed to bring the Oracle data set into SAS first if I wanted to get the same results using WHERE. See below:
data test; set odsstu.registration_course; run; /* bring "odsstu.registration_course" from Oracle to SAS */
data c; set test;
WHERE grade_type not in ('A','X');
run;
Now my data set c would have the same number of records as data set b above. Does anyone have any ideas why this happened? Many thanks for any ideas you can share.
SAS and Oracle treat missing data differently. Null is not evaluated in comparison and Oracle ignores observations where field has Null values. While SAS interprets both Nulls and blanks coming from Oracle as missing values.
@ stat@sas The difference is a bit more complicated DBMS systems are using a three value logic not binary true/false. The concept or that reason is not existing). http://en.wikipedia.org/wiki/Null_(SQL). Within statistics it can be very logical to have different reasons a values is not applicable (multiple missing codings) eg not applicable/ refused to answer / failed to get to this question.
Stasticians and SQL DBMS technical issues is often not the best combination.
use the following options to see what SQL statement SAS have passed to ORACLE .
options sastrace=',,d,' sastraceloc=saslog nostsuffix;
options sastrace=' ,,d,' sastraceloc=saslog nostsuffix;
To get the same results from Oracle (or really almost any non-SAS database system) you need to be explicit about how to handle missing values.
To SAS grade_type not in ('A','X') will include cases where GRADE_TYPE is null (missing).
Change your WHERE clause to
grade_type not in ('A','X') or grade_type is null
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.