Hi,
Can someone tell me why does this code return a flag '1' while it shouldn't ?
proc sql;
create table test as
select region,case
when region in (select region from sashelp.bmt ) then '1'
else '0'
end
as case
from sashelp.demographics;
quit;
Obviously, Region variable is not in BMT dataset so Case variable should be set to 0.
Thanks in advance,
Skp
Puzzling indeed, at first! Here is a hint, try this:
proc sql;
create table test as
select
d.region,
case
when d.region in (select d.region from sashelp.bmt) then '1'
else '0'
end as case
from sashelp.demographics as d;
quit;
Explanation: The reference to column region in the subquery is assumed to be from table sashelp.demographics because region is not a column in sashelp.bmt.
PG
Puzzling indeed, at first! Here is a hint, try this:
proc sql;
create table test as
select
d.region,
case
when d.region in (select d.region from sashelp.bmt) then '1'
else '0'
end as case
from sashelp.demographics as d;
quit;
Explanation: The reference to column region in the subquery is assumed to be from table sashelp.demographics because region is not a column in sashelp.bmt.
PG
I am using 9.4 64 bit and I still receive a '1' for case. Can you give a stronger hint?
proc sql;
drop table test;
create table test as
select
d.region,
case
when d.region in (select d.region from sashelp.bmt ) then '1'
else '0'
end as case
from sashelp.demographics as d ;
quit;
Obs region case
1 AMR 1
2 AMR 1
3 AMR 1
4 AMR 1
5 AMR 1
6 AMR 1
7 AMR 1
The subquery or whatever it's called assumes that region in the select statement is from sashelp.demographics because there is no alias prefix to tell it otherwise so it defaults to whatever table it finds the variable in.
The following will generate the results you expect:
proc sql;
drop table test;
create table test as
select
D.region,
case
when d.region in (select B.region from sashelp.bmt as B ) then '1'
else '0'
end as case
from sashelp.demographics as D ;
quit;
Notice the "d." prefix to column name region in the subquery. d.region refers to column region in sashelp.demographics. As far as the subquery is concerned, d.region is a constant (just like as if you said select "A" from sashelp.bmt, try it! ), the result of which being the value of the constant repeated as many times as there are lines in sashelp.bmt. Note that column region doesn't even exists in table sashelp.bmt, which is why region in the original question was considered to mean d.region by the SQL compiler. Thus, the when condition is always true because b.region is always part of a column of repeated b.region values.
PG
I agree with @pgstats and think this is considered an inline query not a subquery - IIRC, inline queries run for each observation whereas a subquery would run only once.
I couldn't find any reliable source to clarify this point. The expression "inline query" is never mentioned in the 600+ pages standard manual for SQL92. Subqueries however are divided into scalar-, row- and table- subqueries. Thus it seems that any query within another query is a subquery, as far as the standard goes.
PG
Then I probably don't recall correctly 
I'm thinking of when I used to use T-SQL and those types of queries were used occasionally but it's been a *few* years now!
It is called correlated sub-query.
But usually it really correlates to the outer query.
Thanks PGStats ! About the terminology, I couldn't find any 'Inline querry' terminology...I looks like that it is indeed a correlated sub-query.
Cheers
skp
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
