Hey SAS Community,
I am trying to use Proc Sql to select data from a database. I have a SAS data set that contains the unique key for the data I want to select in the Proc Sql.
How can I include the SAS data, with the unique keys, in the Proc Sql statement so that I can pull only the data I need from the database?
I've looked around for an answer to this already, but am new to SAS so excuse my ignorance.
Thank you.
How are you extracting data from your server? An example of the query may help.
The answer depends on how you're set up and partially on the size of the data.
SAS can use data directly from multiple sources, though it can be inefficient at times.
proc sql;
create table want as
select t1.*, t2.Description
from server.ServerData as t1
right join SASdata.SASdata as t2
on t1.id = t2.id;
quit;
@lynagh18 wrote:
Hey SAS Community,
I am trying to use Proc Sql to select data from a database. I have a SAS data set that contains the unique key for the data I want to select in the Proc Sql.
How can I include the SAS data, with the unique keys, in the Proc Sql statement so that I can pull only the data I need from the database?
I've looked around for an answer to this already, but am new to SAS so excuse my ignorance.
Thank you.
1. DBKEY is a good option if an index exists on the database side.
Otherwise it can destroy performance.
2. Another common option if you have a small number of values is storing them in one or several macro variables which are used with the in() operator:
select unique VAL into :values ... /* you have to quote strings here */
select * from RDBMS.TAB where VAL in ( &values. )
3. A third option is to upload the list of values to the RDBMS as a table if you are authorised.
I never really trusted DBKEY as it seldom delivered on its promise. Don't forget it generates one query per lookup value.
Here is a (not so quick it turns out) benchmark.
Done on our Teradata server; As usual, only valid for our data on our platform etc....
4 methods used,
- DBKEY is the worst (2 hours!!, and there is an index),
- macro list or MULTI_DATASRC_OPT are the fastest,
- full download in between
I am *not* allowed to upload, so can't test this.
data LOOKUP; do I=1 to 2e3; ACCOUNT_ID=put(I,z10.); output; end ; run;
%****** Macro list ;
proc sql noprint;
select unique quote(trim(ACCOUNT_ID)) into :values separated by ',' from LOOKUP;
create table T1 as
select ACCOUNT_ID
from TERALIB.TAB b
where ACCOUNT_ID in(&values.)
and b.MONTH_KEY = '201712';
quit;
/*
SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB" WHERE ( ( ("ACCOUNT_ID" IN ( '0000000001' , '0000000002' ,
... 2000 values .....
'0000001995' , '0000001996' , '0000001997' , '0000001998' , '0000001999' , '0000002000' ) ) ) AND ("MONTH_KEY" = '201712' ) )
NOTE: PROCEDURE SQL used (Total process time):
real time 24.00 seconds
*/
%****** Full download;
proc sql;
create table T2 as select b.ACCOUNT_ID
from LOOKUP a, TERALIB.TAB b
where a.ACCOUNT_ID = b.ACCOUNT_ID
and b.MONTH_KEY = '201712';
quit;
/*
SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB" WHERE ("MONTH_KEY" = '201712' )
NOTE: PROCEDURE SQL used (Total process time):
real time 2:57.98
*/
%****** DBKEY= option;
proc sql;
create table T3 as select b.ACCOUNT_ID
from LOOKUP a, TERALIB.TAB(dbkey=ACCOUNT_ID) b
where a.ACCOUNT_ID = b.ACCOUNT_ID
and b.MONTH_KEY = '201712'; *2s;
quit;
/*
2000 queries: USING ("ACCOUNT_ID" VARCHAR (22))SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB" WHERE "ACCOUNT_ID"=:"ACCOUNT_ID"
NOTE: PROCEDURE SQL used (Total process time):
real time 02:03:47.07
*/
%****** MULTI_DATASRC_OPT= option;
proc sql;
create view _V as
select b.ACCOUNT_ID
from LOOKUP a, DBLIB.TAB b
where a.ACCOUNT_ID = b.ACCOUNT_ID
and b.MONTH_KEY = '201712'
using libname DBLIB teradata user=xxx .... multi_datasrc_opt=in_clause;
create table T4 as select * from _V;
quit;
/*
NOTE: PROCEDURE SQL used (Total process time):
real time 19.67 seconds
*/
Yes there is a primary index on ACCOUNT_ID:
proc sql;
connect using TERALIB;
select * from connection to TERALIB (
select count(*)
from dbc.indices
where tablename = 'TAB'
and columnname in('MONTH_KEY', 'ACCOUNT_ID')
and IndexType in('P', 'Q')
);
returns 1.
It's also worth noting that the full download is automatically optimised by SAS to a query containing an in() clause if the LOOKUP table is smaller.
Assuming
:
proc sql;
create table want as
select a.*
from datlib.dbtable a, saslib.sastable b
where a.key = b.key;
quit;
This is the basic blueprint for a SQL inner join.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.