Hello Everyone,
This is partially off-topic since Proc FedSQL goes geyond Viya,
however my question might be especially relevant here since
Proc SQL is not CAS-enabled (or, ist it ? in preproduction seems to work fine 😉 ) :
/* Proc SQL with calculated column */
Proc SQL;
select t.*, scan(t.email,-1) as MAILTLD
from mytable as t
where t.flag = 'SPAM' and calculated MAILTLD eq 'com'
;
quit;
How do we refactor this code on Viya using Proc FedSQL ANSI syntax ? I was thinking of correlated subqueries, something like
/* Proc FedSQL correlated subquery - roughly sketched*/
Proc FedSQL sessref=mycas;
select t.*
from caslib."mytable" as t
where t.flag = 'SPAM' and 'com' = ( select scan( v.email , -1)
from caslib."mytable" as v
where v.email = t.email )
;
quit;
But, then I read in Proc FedSQL documentation :
Note: Correlated subqueries are not yet supported on the CAS server.
What option do we still have ? Creating a second "outer" result set then merging with the "inner" query ?
Just replace EQ with = and the syntax becomes valid.
Alternatively consider option2 and filter the data directly when reading from source, then create the calculated column in the Select clause and Group by this calculated column.
data cars;
set sashelp.cars;
run;
proc datasets lib=work nolist nowarn;
delete mycars mycars2;
run;
quit;
/* option 1 */
proc fedsql;
/* create table mycars as*/
select origin, c1.priceType, avg(invoice) as avg_invoice
from
(
select *, case when invoice >= 0 and invoice < 30000 then 'affordable'
when invoice >= 30000 and invoice < 60000 then 'need to think'
else 'not my budget'
end as priceType
from cars
) as c1
where priceType = 'affordable'
group by origin, priceType
;
quit;
/* option 2 */
proc fedsql;
/* create table mycars2 as*/
select
origin,
case
when invoice >= 0 and invoice < 30000 then 'affordable'
when invoice >= 30000 and invoice < 60000 then 'need to think'
else 'not my budget'
end as priceType,
avg(invoice) as avg_invoice
from cars
where invoice >= 0 and invoice < 30000
group by origin, priceType
;
quit;
Repeat your expression from the select for MAILTLD in the where and it should work
Thanks for your reply, I gave this simple code as an example. Of course we can apply the SCAN function straight in the WHERE clause (If I follow you correctly) .
Then how can we proceed when the calculated column is a more complex transformation, for instance coming from a CASE manipulation of values ? The following code is quite arbitrary, this is just an example :
Proc SQL;
select t.*, CASE WHEN scan(t.email,-1) EQ 'com' THEN 'COMMERCIAL' ELSE 'NON COMMERCIAL' as FLAGCOM
from mytable
where t.FLAG = 'SPAM' AND calculated FLAGCOM = 'COMMERCIAL'
;
quit;
If you have several long expressions, think about creating an inline view, the following example shows it
data cars;
set sashelp.cars;
run;
proc delete data=mycars;
run;
proc fedsql ;
create table mycars as
select
origin
, priceType
, avg(invoice) as avg_invoice
from (
select
*
, case
when invoice >= 0 and invoice < 30000 then 'affordable'
when invoice >= 30000 and invoice < 60000 then 'need to think'
else 'not my budget'
end as priceType
from
cars
) as c1
group by
origin
, priceType
;
quit;
proc print data=mycars;
run;
Thanks for your code; however the code I have to migrate requires to filter the rows with a calculated column. The requirement was not clearly expressed, my fault. I ran your code in a SAS 9 session, adding only a WHERE clause before the GROUP BY, like this :
proc fedsql ;
create table mycars as
select origin, priceType, avg(invoice) as avg_invoice
from (
select *, case when invoice >= 0 and invoice < 30000 then 'affordable'
when invoice >= 30000 and invoice < 60000 then 'need to think'
else 'not my budget'
end as priceType
from cars
) as c1
where c1.priceType eq 'affordable'
group by origin, priceType
;
quit;
Proc SQL accepts this kind of syntax, Proc FedSQL does not unfortunately (on SAS 9 at least):
ERROR: Syntax error at or near "EQ"
NOTE: PROC FEDSQL has set option NOEXEC and will continue to prepare statements.
Just replace EQ with = and the syntax becomes valid.
Alternatively consider option2 and filter the data directly when reading from source, then create the calculated column in the Select clause and Group by this calculated column.
data cars;
set sashelp.cars;
run;
proc datasets lib=work nolist nowarn;
delete mycars mycars2;
run;
quit;
/* option 1 */
proc fedsql;
/* create table mycars as*/
select origin, c1.priceType, avg(invoice) as avg_invoice
from
(
select *, case when invoice >= 0 and invoice < 30000 then 'affordable'
when invoice >= 30000 and invoice < 60000 then 'need to think'
else 'not my budget'
end as priceType
from cars
) as c1
where priceType = 'affordable'
group by origin, priceType
;
quit;
/* option 2 */
proc fedsql;
/* create table mycars2 as*/
select
origin,
case
when invoice >= 0 and invoice < 30000 then 'affordable'
when invoice >= 30000 and invoice < 60000 then 'need to think'
else 'not my budget'
end as priceType,
avg(invoice) as avg_invoice
from cars
where invoice >= 0 and invoice < 30000
group by origin, priceType
;
quit;
Thank you both for your help 😃, with the correct operator, FedSQL runs OK on SAS 9 and CAS as well.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!