BookmarkSubscribeRSS Feed
lynagh18
Fluorite | Level 6

Hey SAS Community,

 

I am trying to use Proc Sql to select data from a database. I have a SAS data set that contains the unique key for the data I want to select in the Proc Sql.

 

How can I include the SAS data, with the unique keys, in the Proc Sql statement so that I can pull only the data I need from the database?

 

I've looked around for an answer to this already, but am new to SAS so excuse my ignorance.

 

Thank you.

5 REPLIES 5
Reeza
Super User

How are you extracting data from your server? An example of the query may help. 

The answer depends on how you're set up and partially on the size of the data. 

 

SAS can use data directly from multiple sources, though it can be inefficient at times. 

 

proc sql;
create table want as
select t1.*, t2.Description
from server.ServerData as t1
right join SASdata.SASdata as t2
on t1.id = t2.id;
quit;

 

 


@lynagh18 wrote:

Hey SAS Community,

 

I am trying to use Proc Sql to select data from a database. I have a SAS data set that contains the unique key for the data I want to select in the Proc Sql.

 

How can I include the SAS data, with the unique keys, in the Proc Sql statement so that I can pull only the data I need from the database?

 

I've looked around for an answer to this already, but am new to SAS so excuse my ignorance.

 

Thank you.


 

LinusH
Tourmaline | Level 20
There were literally tons of such inquiries on the forum, tip: do a search.
Hint: DBKEY.
Data never sleeps
ChrisNZ
Tourmaline | Level 20

1. DBKEY is a good option if an index exists on the database side.

Otherwise it can destroy performance.

 

2. Another common option if you have a small number of values is storing them in one or several macro variables which are used with the in() operator:
select unique VAL into :values ...    /* you have to quote strings here */  
select * from RDBMS.TAB where VAL in ( &values. )

 

 3. A third option is to upload the list of values to the RDBMS as a table if you are authorised.

ChrisNZ
Tourmaline | Level 20

I never really trusted DBKEY as it seldom delivered on its promise. Don't forget it generates one query per lookup value.

Here is a (not so quick it turns out) benchmark.

Done on our Teradata server; As usual, only valid for our data on our platform etc....

 

4 methods used,

- DBKEY is the worst (2 hours!!, and there is an index),

- macro list or MULTI_DATASRC_OPT are the fastest,

 - full download in between

I am *not* allowed to upload, so can't test this.

 

 


data LOOKUP; do I=1 to 2e3; ACCOUNT_ID=put(I,z10.); output; end ;  run;

%****** Macro list ;
proc sql noprint; 
  select unique quote(trim(ACCOUNT_ID)) into :values separated by ',' from LOOKUP;
  create table T1 as 
  select ACCOUNT_ID            
  from TERALIB.TAB       b 
  where ACCOUNT_ID  in(&values.) 
    and b.MONTH_KEY = '201712';
quit;
/* 
SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB"  WHERE  ( ( ("ACCOUNT_ID" IN  ( '0000000001' , '0000000002' , 
... 2000 values .....
'0000001995' , '0000001996' , '0000001997' , '0000001998' , '0000001999' , '0000002000' ) ) ) AND  ("MONTH_KEY" = '201712' ) )
NOTE: PROCEDURE SQL used (Total process time):
      real time           24.00 seconds
*/

%****** Full download;
proc sql;                       
  create table T2 as select b.ACCOUNT_ID 
  from LOOKUP a, TERALIB.TAB b
  where a.ACCOUNT_ID = b.ACCOUNT_ID
    and b.MONTH_KEY  = '201712';           
quit;
/* 
SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB"  WHERE  ("MONTH_KEY" = '201712' )
NOTE: PROCEDURE SQL used (Total process time):
      real time           2:57.98  
*/

%****** DBKEY= option;
proc sql;                       
  create table T3 as select b.ACCOUNT_ID 
  from LOOKUP a, TERALIB.TAB(dbkey=ACCOUNT_ID)  b 
  where a.ACCOUNT_ID = b.ACCOUNT_ID
    and b.MONTH_KEY  = '201712';    *2s;
quit;
/*
2000 queries:  USING ("ACCOUNT_ID" VARCHAR (22))SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB" WHERE "ACCOUNT_ID"=:"ACCOUNT_ID"
NOTE: PROCEDURE SQL used (Total process time):
      real time        02:03:47.07
*/

%****** MULTI_DATASRC_OPT= option;
proc sql;
  create view _V as
  select b.ACCOUNT_ID 
  from LOOKUP a, DBLIB.TAB b 
  where a.ACCOUNT_ID = b.ACCOUNT_ID
    and b.MONTH_KEY  = '201712' 
  using libname DBLIB teradata user=xxx .... multi_datasrc_opt=in_clause;
  create table T4 as select * from _V;
quit;
/*
NOTE: PROCEDURE SQL used (Total process time):
      real time           19.67 seconds
*/

 

 

Yes there is a primary index on ACCOUNT_ID:

 

proc sql; 
  connect using TERALIB;
  select * from connection to TERALIB (
    select count(*)
    from dbc.indices
    where tablename   = 'TAB'
      and columnname  in('MONTH_KEY', 'ACCOUNT_ID')
      and IndexType   in('P', 'Q') 
  );

returns 1.

 

 

It's also worth noting that the full download is automatically optimised by SAS to a query containing an in() clause if the LOOKUP table is smaller.

 

 

Kurt_Bremser
Super User

Assuming

  • datlib is the library defined for your database
  • saslib is the library where your SAS dataset resides
  • dbtable is the table in your database
  • sastable is your dataset with the unique keys
  • key is the name of the unique key

:

proc sql;
create table want as
select a.*
from datlib.dbtable a, saslib.sastable b
where a.key = b.key;
quit;

This is the basic blueprint for a SQL inner join.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 864 views
  • 0 likes
  • 5 in conversation