I want to select in proc sql with a where statement from a list
from test.account(where =(delete_ind = 'N'
and current_record_ind ='Y'
and appid in(select id from test2)))
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,
a missing value, -.
ERROR 76-322: Syntax error, statement will be ignored.
ERROR: Syntax error while parsing WHERE clause.
ERROR: Invalid value for the WHERE option.
How should I do this?
@Kiteulf What about an inner join?
from test.account r inner join test2 l on r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id
or implicit:
from test.account r,test2 l where r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id
You can't do that in the Data Set Options directly.
See the difference in the two PROC SQL steps below. Should give you the answer.
/* Test data */
data one;
input id x;
datalines;
1 10
2 20
3 30
;
data two;
input id;
datalines;
1
3
;
/* Error */
proc sql;
create table want as
select one.*
from one(where = (id in (select id from two)));
quit;
/* All good */
proc sql;
create table want as
select one.*
from one
where id in (select id from two);
quit;
So then I will need to do;
from test.account
where delete_ind = 'N'
and current_record_ind ='Y'
and appid in(select id from test2)
The problem then is that the query bogs down. At the moment I have only 3 records from test2 but there will be more later.
If I am doing it manually and
from test.account (where =(delete_ind = 'N'
and current_record_ind ='Y'
and appid in(3819532,3809511,3812779))) as t0
it is super fast
@Kiteulf What about an inner join?
from test.account r inner join test2 l on r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id
or implicit:
from test.account r,test2 l where r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id
It sort of works.
It is not painstakingly slow but also not fast. Valid input! I forgot about that possibility.
from test.account r,test2 l where r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id
@Kiteulf Depending on the number of rows in your tables below eventually performs much better.
data want;
set test.account(where=(delete_ind='N' and current_record_ind='Y'));
if _n_=1 then
do;
if 0 then set test2(keep=id rename=(id=appid));
dcl hash h1 (dataset:'test2(keep=id rename=(id=appid))');
h1.defineKey('appid')
h1.defineDone();
end;
if h1.check()=0;
run;
I forgot to mention that this data is on a SQL server. So the hashing will not work very well towards that right...
I forgot to mention that this data is on a SQL server. So the hashing will not work very well towards that right...
That's certainly not an insignificant detail. You could investigate if appid and/or id are indexed and then eventually have this condition first in your where clause. Not sure how smart the SQL Server compiler is so these things are always worth a try.
r.appid=l.id
If performance is really that important to you then you could also investigate or talk to a DBA if there are any SQL Server side tweaks possible - like writing code that executes in parallel. You then would use Postgres SQL code and pass it to the DB from SAS via explicit SQL.
Also make sure that you don't use a select * but list explicitly the columns you really need. Sending the data from the DB to the SAS Server is often the bottleneck.
A few other things that also can improve performance significantly - docu here:
DIRECT_SQL=Yes
READBUFF set to a value suitable for the environment
And last but not least: I assume BOTH your tables are in the DB. If not then please tell us as this will be highly likely the reason for bad performance - and there are ways how to code differently to improve things.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.