BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Wolverine
Pyrite | Level 9

I ran into this issue recently after I renamed the ID variable in one of my files. As in the example below, I was attempting to restrict the cases in one file to the cases that were present in another file. But because the ID variable name was different, SAS ignored the Where statement. What really bothered me is that there was no error or even warning in the log. It would be very easy to overlook this problem and think that the output file has the proper cases when it doesn't.

 

I assume this is a known issue, but I had never heard of it. I tried searching, but I didn't find anything. Can someone point me to more information about this?

 

In the example below, the 1st file has 4 cases and the 2nd file has 3 cases. The PROC SQL is intended to restrict the 1st file to just those 3 cases. Instead, because the ID variable has been renamed in the 2nd file, PROC SQL returns all 4 cases with no errors in the log.

 

 

data has_ID_var;
    input ID test_flag;
    datalines;
    1234    1
    2345    0
    3456    0
    4567    1
; run;

data has_personID_var;
    input personID;
    datalines;
    1234   
    2345   
    3456   
; run;

PROC SQL;
  Create table output
    as Select *
  From has_ID_var
  Where ID in (select ID from has_personID_var);
quit;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

It is just doing what you asked.

 

Since ID does not exist in the second dataset it uses the only ID variable there is.

 

You have effectively run the same thing as this code:

PROC SQL;
  Create table output
    as Select *
  From has_ID_var
  Where 'xx' in (select 'xx' from has_personID_var);
quit;

Use table aliases to make it clear which ID variable you are talking about.

49   PROC SQL;
50     Create table output
51       as Select *
52     From has_ID_var a
53     Where a.id in (select b.id from has_personID_var b);
ERROR: Column id could not be found in the table/view identified with the correlation name B.
ERROR: Unresolved reference to table/correlation name b.
ERROR: Expression using IN has components that are of different data types.
NOTE: The IN referred to may have been transformed from an OR to an IN at some point during PROC SQL WHERE clause optimization.
54   quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

This looks like a bug, and if I were you, I would contact SAS technical support. 

--
Paige Miller
antonbcristina
SAS Employee

I tested the subquery alone and it does produce an ERROR revealing the issue:

 

34         proc sql;
35            select ID from has_personID_var;
ERROR: The following columns were not found in the contributing tables: ID.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
36         quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

However, the outer query does not see this as an issue. Here is a related post: https://communities.sas.com/t5/SAS-Procedures/SQL-subquery-wrong-variable-name-no-error/td-p/606354 

 

PaigeMiller
Diamond | Level 26

Hello, @antonbcristina 

 

I ran the exact code provided by @Wolverine and no errors appeared. I am using SAS 9.4 TS Level 1M5. What version of SAS are you running?

--
Paige Miller
antonbcristina
SAS Employee

Hi @PaigeMiller, you're right the original code provided returned no errors. The error only appears if you run the subquery alone:

 

PROC SQL;
  select ID 
  from has_personID_var;
quit;

I'm running this on the same version of SAS.

ballardw
Super User

@antonbcristina wrote:

Hi @PaigeMiller, you're right the original code provided returned no errors. The error only appears if you run the subquery alone:

 

PROC SQL;
  select ID 
  from has_personID_var;
quit;

I'm running this on the same version of SAS.


That code should generate an error. There is no variable ID in Has_personId_var.

 

Since you renamed the variable then I suspect that you want to change

  Where ID in (select ID from has_personID_var);

to

  Where ID in (select personID as ID from has_personID_var);

so there is a value of ID to find from Has_personID_var.

Tom
Super User Tom
Super User

It is just doing what you asked.

 

Since ID does not exist in the second dataset it uses the only ID variable there is.

 

You have effectively run the same thing as this code:

PROC SQL;
  Create table output
    as Select *
  From has_ID_var
  Where 'xx' in (select 'xx' from has_personID_var);
quit;

Use table aliases to make it clear which ID variable you are talking about.

49   PROC SQL;
50     Create table output
51       as Select *
52     From has_ID_var a
53     Where a.id in (select b.id from has_personID_var b);
ERROR: Column id could not be found in the table/view identified with the correlation name B.
ERROR: Unresolved reference to table/correlation name b.
ERROR: Expression using IN has components that are of different data types.
NOTE: The IN referred to may have been transformed from an OR to an IN at some point during PROC SQL WHERE clause optimization.
54   quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Wolverine
Pyrite | Level 9

It is just doing what you asked.

I disagree. I asked SAS to find a specific variable from a specific file (ie, "select ID from has_personID_var"). But since it couldn't find that variable in that file, it took the data from a different file. At the very least, it should have given me a warning that it was using a different file than the file I instructed it to use.

 

Use table aliases to make it clear which ID variable you are talking about.

That is a simple and effective solution, and that is how I will be writing these types of Where statements from now on.

 

Thanks everyone for providing links and explanations!

 

Tom
Super User Tom
Super User

Your mental model for how SQL works it too limited. 

 

In a sub query you can reference any varaible, not just those that come from the datasets mentioned in the FROM clause of a SUBQUERY.

 

PROC SQL allows you to reference variables without specifying the dataset (table) they come from, but if you want to use that to reduce the amount of typing you need to do then you need to understand that there is a risk it will make the wrong choice.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1575 views
  • 0 likes
  • 6 in conversation