I encountered this (stunning) issue today:
171 proc sql ;
172 create table ds_datadiff as
173 select *
174 from sasdata.ds
175 where recordid in
176 ( select recordid
177 from delta.datadiff
178 where dataset = "DS"
179 )
180 ;
NOTE: Table WORK.DS_DATADIFF created, with 45 rows and 55 columns.
181 quit ;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
182
183 proc sql ;
184 select b.recordid
185 from delta.datadiff as b
186 where dataset = "DS"
187 ;
ERROR: Column recordid could not be found in the table/view identified with
the correlation name B.
188 quit ;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
189
190 proc sql ;
191 create table ds_datadiff as
192 select *
193 from sasdata.ds
194 where recordid in
195 ( select b.recordid
196 from delta.datadiff as b
197 where dataset = "DS"
198 )
199 ;
ERROR: Column recordid could not be found in the table/view identified with
the correlation name B.
ERROR: Unresolved reference to table/correlation name b.
ERROR: Expression using IN has components that are of different data types.
NOTE: The IN referred to may have been transformed from an OR to an IN at some
point during PROC SQL WHERE clause optimization.
200 quit ;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
The issue is that RECORDID is not is DELTA.DATADIFF as illustrated by the second two SQL procedure above. I find this highly unexpected and worrisome. Should the subquery be "protected", meaning that the SELECT clause in it should not pull from other parts of the SQL? I may begin defensive programming like the second procedure to protect against such errors in coding.
Thank you,
Kevin
SAS is just being helpful. You asked for RECORDID and it found a variable RECORDID and used it.
Not super helpful but this is something that's been around for a long time and is the default behaviour.
Bottom line: always use two-level column names (at least when more than one table is used) in a SQL select, for clear identification.
Hey Kevin,
You're not alone in being stunned by this. I was tipped off to it a few months ago, and started a SAS-L thread that got some good feedback from the 'L: https://listserv.uga.edu/cgi-bin/wa?A2=SAS-L;11810443.1907b .
This code runs without errors:
proc sql ;
select *
from sashelp.class
where name in
(select name /*n.b.: sashelp.iris does not have a variable NAME*/
from sashelp.iris
where species is not missing
)
;
quit;
It turns out the SQL compiler happily turns what the author intended as in inner query into a correlated subquery. In the SAS-L thread @yabwon kindly pointed out that if you add the FEEDBACK option, the log will show you that the reference to NAME in the inner query is interpreted as class.name rather than iris.name:
11 proc sql feedback; 12 select * 13 from sashelp.class 14 where name in 15 (select name /*n.b.: sashelp.iris does not have a variable NAME*/ 16 from sashelp.iris 17 where species is not missing 18 ) 19 ; NOTE: Statement transforms to: select CLASS.Name, CLASS.Sex, CLASS.Age, CLASS.Height, CLASS.Weight from SASHELP.CLASS where CLASS.Name in ( select CLASS.Name from SASHELP.IRIS where IRIS.Species is not null ); 20 quit;
As you showed and Kurt mentioned, the offensive programming to catch this error would be to always use two level names when you have two tables in a query:
21 proc sql ; 22 select * 23 from sashelp.class as a 24 where a.name in 25 (select b.name /*n.b.: sashelp.iris does not have a variable NAME*/ 26 from sashelp.iris as b 27 where species is not missing 28 ) 29 ; ERROR: Column name could not be found in the table/view identified with the correlation name B. ERROR: Unresolved reference to table/correlation name b. 30 quit; NOTE: The SAS System stopped processing this step because of errors.
I haven't been in that habit, but seeing this scared me enough to try to learn this new habit.
--Q.
SAS is just being helpful. You asked for RECORDID and it found a variable RECORDID and used it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.