I ran into this issue recently after I renamed the ID variable in one of my files. As in the example below, I was attempting to restrict the cases in one file to the cases that were present in another file. But because the ID variable name was different, SAS ignored the Where statement. What really bothered me is that there was no error or even warning in the log. It would be very easy to overlook this problem and think that the output file has the proper cases when it doesn't.
I assume this is a known issue, but I had never heard of it. I tried searching, but I didn't find anything. Can someone point me to more information about this?
In the example below, the 1st file has 4 cases and the 2nd file has 3 cases. The PROC SQL is intended to restrict the 1st file to just those 3 cases. Instead, because the ID variable has been renamed in the 2nd file, PROC SQL returns all 4 cases with no errors in the log.
data has_ID_var;
input ID test_flag;
datalines;
1234 1
2345 0
3456 0
4567 1
; run;
data has_personID_var;
input personID;
datalines;
1234
2345
3456
; run;
PROC SQL;
Create table output
as Select *
From has_ID_var
Where ID in (select ID from has_personID_var);
quit;
It is just doing what you asked.
Since ID does not exist in the second dataset it uses the only ID variable there is.
You have effectively run the same thing as this code:
PROC SQL;
Create table output
as Select *
From has_ID_var
Where 'xx' in (select 'xx' from has_personID_var);
quit;
Use table aliases to make it clear which ID variable you are talking about.
49 PROC SQL; 50 Create table output 51 as Select * 52 From has_ID_var a 53 Where a.id in (select b.id from has_personID_var b); ERROR: Column id could not be found in the table/view identified with the correlation name B. ERROR: Unresolved reference to table/correlation name b. ERROR: Expression using IN has components that are of different data types. NOTE: The IN referred to may have been transformed from an OR to an IN at some point during PROC SQL WHERE clause optimization. 54 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
This looks like a bug, and if I were you, I would contact SAS technical support.
I tested the subquery alone and it does produce an ERROR revealing the issue:
34 proc sql;
35 select ID from has_personID_var;
ERROR: The following columns were not found in the contributing tables: ID.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
36 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
However, the outer query does not see this as an issue. Here is a related post: https://communities.sas.com/t5/SAS-Procedures/SQL-subquery-wrong-variable-name-no-error/td-p/606354
Hello, @antonbcristina
I ran the exact code provided by @Wolverine and no errors appeared. I am using SAS 9.4 TS Level 1M5. What version of SAS are you running?
Hi @PaigeMiller, you're right the original code provided returned no errors. The error only appears if you run the subquery alone:
PROC SQL;
select ID
from has_personID_var;
quit;
I'm running this on the same version of SAS.
@antonbcristina wrote:
Hi @PaigeMiller, you're right the original code provided returned no errors. The error only appears if you run the subquery alone:
PROC SQL; select ID from has_personID_var; quit;
I'm running this on the same version of SAS.
That code should generate an error. There is no variable ID in Has_personId_var.
Since you renamed the variable then I suspect that you want to change
Where ID in (select ID from has_personID_var);
to
Where ID in (select personID as ID from has_personID_var);
so there is a value of ID to find from Has_personID_var.
It is just doing what you asked.
Since ID does not exist in the second dataset it uses the only ID variable there is.
You have effectively run the same thing as this code:
PROC SQL;
Create table output
as Select *
From has_ID_var
Where 'xx' in (select 'xx' from has_personID_var);
quit;
Use table aliases to make it clear which ID variable you are talking about.
49 PROC SQL; 50 Create table output 51 as Select * 52 From has_ID_var a 53 Where a.id in (select b.id from has_personID_var b); ERROR: Column id could not be found in the table/view identified with the correlation name B. ERROR: Unresolved reference to table/correlation name b. ERROR: Expression using IN has components that are of different data types. NOTE: The IN referred to may have been transformed from an OR to an IN at some point during PROC SQL WHERE clause optimization. 54 quit; NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
It is just doing what you asked.
I disagree. I asked SAS to find a specific variable from a specific file (ie, "select ID from has_personID_var"). But since it couldn't find that variable in that file, it took the data from a different file. At the very least, it should have given me a warning that it was using a different file than the file I instructed it to use.
Use table aliases to make it clear which ID variable you are talking about.
That is a simple and effective solution, and that is how I will be writing these types of Where statements from now on.
Thanks everyone for providing links and explanations!
Your mental model for how SQL works it too limited.
In a sub query you can reference any varaible, not just those that come from the datasets mentioned in the FROM clause of a SUBQUERY.
PROC SQL allows you to reference variables without specifying the dataset (table) they come from, but if you want to use that to reduce the amount of typing you need to do then you need to understand that there is a risk it will make the wrong choice.
Hello @Wolverine,
Here's another discussion (from 2021) of exactly this issue, with explanations: https://communities.sas.com/t5/SAS-Programming/Proc-sql-giving-result-when-expected-not-to/m-p/72350...
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.