- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi
When I run the below proc sql code it returns results and I did expect it to give an error. Can anyone explain why?
Running SAS 9.4 on Unix. I expected it to fail since variable sex2 is not in sashelp.class but instead work.class with 10 obs was returned. Have entered my log notes as well as the code.
*-- Creating subset of data renaming variable sex to sex2 --*; data work.class(rename=(sex=sex2)); set sashelp.class(where=(sex='M')); run; NOTE: There were 10 observations read from the data set SASHELP.CLASS. WHERE sex='M'; NOTE: The data set WORK.CLASS has 10 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds *-- Selecting from subset using sex2 in (select sex2) --*; *-- Expected this to fail instead class2 is returned --*; *-- as result --*; proc sql; create table class2 as select * from class where sex2 in(select sex2 from sashelp.class); /* sex2 is not in sashelp.class */ quit;
NOTE: Table WORK.CLASS2 created, with 10 rows and 5 columns. NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Ukar441 and welcome to the SAS Support Communities!
You created a correlated subquery, i.e., the sql-expression sex2 in the subquery is recognized as that variable from WORK.CLASS in the outer query. As such it is syntactically correct.
Equivalently, you could have written:
proc sql; create table class2 as select * from class a where sex2 in (select a.sex2 from sashelp.class); quit;
or (since sex2='M' for all observations in WORK.CLASS)
proc sql; create table class2 as select * from class where sex2 in (select 'M' from sashelp.class); quit;
Like variable sex2 (but unlike, say, sex3, which would cause an error) the character constant 'M' (or any other character constant for that matter) is a valid sql-expression which can be compared to values of character variable sex2 from the outer query.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Ukar441 and welcome to the SAS Support Communities!
You created a correlated subquery, i.e., the sql-expression sex2 in the subquery is recognized as that variable from WORK.CLASS in the outer query. As such it is syntactically correct.
Equivalently, you could have written:
proc sql; create table class2 as select * from class a where sex2 in (select a.sex2 from sashelp.class); quit;
or (since sex2='M' for all observations in WORK.CLASS)
proc sql; create table class2 as select * from class where sex2 in (select 'M' from sashelp.class); quit;
Like variable sex2 (but unlike, say, sex3, which would cause an error) the character constant 'M' (or any other character constant for that matter) is a valid sql-expression which can be compared to values of character variable sex2 from the outer query.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@FreelanceReinh Can you tell us more how a correlated subquery can replace and swap variable names, and the the name from the outer query and replace it in the subquery?
Any other cases where this name-swapping takes place?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ChrisNZ wrote:
@FreelanceReinh Can you tell us more how a correlated subquery can replace and swap variable names, and the the name from the outer query and replace it in the subquery?
Any other cases where this name-swapping takes place?
I don't think anything special is happening here. It's just that variables from all contributing tables are valid in an sql-expression in a subquery (cf. the wording of the error message obtained with sex3). The expression sex2 in the subquery does not select values from variable sex in SASHELP.CLASS (using a "wrong" or "swapped" name). Rather it's a constant (like the 'M' in my earlier example) w.r.t. the subquery for each record from WORK.CLASS. As a consequence, the subquery results in a temporary table consisting of as many identical values (equal to the current value of sex2 from WORK.CLASS) as there are observations in SASHELP.CLASS, i.e. 19. Trivially, the resulting WHERE condition is always met, unless the table in the FROM clause of the subquery was empty. So, any non-empty dataset, even if totally unrelated to WORK.CLASS in terms of variable names, types and values, could be used as well in place of SASHELP.CLASS in the subquery.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your answer. It was really helpful and explained it well. Following your reasoning gives the following conclusion. If we change the where clause somewhat I get these results.
where sex2 in(select sex2 from sashelp.class); NOTE: Table WORK.CLASS2 created, with 10 rows and 5 columns. where sex2 in(select 'F' from sashelp.class); where sex2 in(select 'Hello' from sashelp.class); where sex2 in(select name from sashelp.class); NOTE: Table WORK.CLASS2 created, with 0 rows and 5 columns. where sex2 in(select a.sex2 from sashelp.class as a); ERROR: Column sex2 could not be found in the table/view identified with the correlation name A. ERROR: Unresolved reference to table/correlation name a.
First example is the original and there sex2 is correlated with sex2 in work.class.
Next three examples evaluates the character expression and gives expected = rows.
Last example when I explicitly point to sashelp.class with a.sex2 I get the error I expected in the first place.
However what was new to me was that in example 1,2 and 3 the "from sashelp.class" in a sense is ignored. Example 1 the select uses sex2 from another table and in 2 and 3 from sashelp.class is sort of ignored.
It is not quite a variable swap as @ChrisNZ suggest but more of an order for searching.
Totally new to me and makes me wonder how many bugs I've created over the years.
Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ukar441 wrote:
Thanks for your answer. (...)
However what was new to me was that in example 1,2 and 3 the "from sashelp.class" in a sense is ignored. (...)
You're welcome. In these examples the only impact of "sashelp.class" is the number of (identical) observations resulting from the subquery (i.e. 19). Interestingly, PROC SQL seems to be clever enough not to waste resources to create a huge trivial table if that number is large. [Edit: The run time does not depend on that number.] It uses the same methods whether a constant (like 'M') is selected or a variable from the outer query like sex2, but the set of methods is different in the situation where a variable like name or sex is selected in the subquery. This can be seen by using the undocumented option _method of the PROC SQL statement:
130 proc sql _method; 131 create table class2 as 132 select * from class 133 where sex2 in (select sex from sashelp.class); NOTE: SQL execution methods chosen are: sqxcrta sqxfil sqxsrc( WORK.CLASS ) NOTE: SQL subquery execution methods chosen are: sqxsubq sqxsrc( SASHELP.CLASS ) NOTE: Table WORK.CLASS2 created, with 10 rows and 5 columns. 134 quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 135 136 proc sql _method; 137 create table class2 as 138 select * from class 139 where sex2 in (select sex2 from sashelp.class); NOTE: SQL execution methods chosen are: sqxcrta sqxfil sqxsrc( WORK.CLASS ) NOTE: SQL subquery execution methods chosen are: sqxsubq sqxreps sqxsrc( SASHELP.CLASS ) NOTE: Table WORK.CLASS2 created, with 10 rows and 5 columns. 140 quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 141 142 proc sql _method; 143 create table class2 as 144 select * from class 145 where sex2 in (select 'X' from sashelp.class); NOTE: SQL execution methods chosen are: sqxcrta sqxfil sqxsrc( WORK.CLASS ) NOTE: SQL subquery execution methods chosen are: sqxsubq sqxreps sqxsrc( SASHELP.CLASS ) NOTE: Table WORK.CLASS2 created, with 0 rows and 5 columns.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I-m not the first one puzzled by this. Looked at your answer and googled the method sqxreps and one of the first links was this one sas - Proc sql subquery based on nonexisitng column returns not null - Stack Overflow . A discussion on the same topic. Right or wrong can probably be discussed. My own takeaway will be that it might be a good idea to be more strict in referencing tables/views i.e to use tablename.variablename notation in the sql queries.