In Proc Sql statment, include SAS data to use as a primary key

lynagh18 · Posted 05-02-2018 03:21 PM

Hey SAS Community,

I am trying to use Proc Sql to select data from a database. I have a SAS data set that contains the unique key for the data I want to select in the Proc Sql.

How can I include the SAS data, with the unique keys, in the Proc Sql statement so that I can pull only the data I need from the database?

I've looked around for an answer to this already, but am new to SAS so excuse my ignorance.

Thank you.

Reeza · Posted 05-02-2018 03:25 PM

How are you extracting data from your server? An example of the query may help.

The answer depends on how you're set up and partially on the size of the data.

SAS can use data directly from multiple sources, though it can be inefficient at times.

proc sql;
create table want as
select t1.*, t2.Description
from server.ServerData as t1
right join SASdata.SASdata as t2
on t1.id = t2.id;
quit;

@lynagh18 wrote:

Hey SAS Community,

I am trying to use Proc Sql to select data from a database. I have a SAS data set that contains the unique key for the data I want to select in the Proc Sql.

How can I include the SAS data, with the unique keys, in the Proc Sql statement so that I can pull only the data I need from the database?

I've looked around for an answer to this already, but am new to SAS so excuse my ignorance.

Thank you.

LinusH · Posted 05-02-2018 03:26 PM

There were literally tons of such inquiries on the forum, tip: do a search.
Hint: DBKEY.

Data never sleeps

ChrisNZ · Posted 05-02-2018 08:00 PM

1. DBKEY is a good option if an index exists on the database side.

Otherwise it can destroy performance.

2. Another common option if you have a small number of values is storing them in one or several macro variables which are used with the in() operator:
select unique VAL into :values ... /* you have to quote strings here */
select * from RDBMS.TAB where VAL in ( &values. )

3. A third option is to upload the list of values to the RDBMS as a table if you are authorised.

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 05-02-2018 11:57 PM

I never really trusted DBKEY as it seldom delivered on its promise. Don't forget it generates one query per lookup value.

Here is a (not so quick it turns out) benchmark.

Done on our Teradata server; As usual, only valid for our data on our platform etc....

4 methods used,

- DBKEY is the worst (2 hours!!, and there is an index),

- macro list or MULTI_DATASRC_OPT are the fastest,

- full download in between

I am *not* allowed to upload, so can't test this.


data LOOKUP; do I=1 to 2e3; ACCOUNT_ID=put(I,z10.); output; end ;  run;

%****** Macro list ;
proc sql noprint; 
  select unique quote(trim(ACCOUNT_ID)) into :values separated by ',' from LOOKUP;
  create table T1 as 
  select ACCOUNT_ID            
  from TERALIB.TAB       b 
  where ACCOUNT_ID  in(&values.) 
    and b.MONTH_KEY = '201712';
quit;
/* 
SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB"  WHERE  ( ( ("ACCOUNT_ID" IN  ( '0000000001' , '0000000002' , 
... 2000 values .....
'0000001995' , '0000001996' , '0000001997' , '0000001998' , '0000001999' , '0000002000' ) ) ) AND  ("MONTH_KEY" = '201712' ) )
NOTE: PROCEDURE SQL used (Total process time):
      real time           24.00 seconds
*/

%****** Full download;
proc sql;                       
  create table T2 as select b.ACCOUNT_ID 
  from LOOKUP a, TERALIB.TAB b
  where a.ACCOUNT_ID = b.ACCOUNT_ID
    and b.MONTH_KEY  = '201712';           
quit;
/* 
SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB"  WHERE  ("MONTH_KEY" = '201712' )
NOTE: PROCEDURE SQL used (Total process time):
      real time           2:57.98  
*/

%****** DBKEY= option;
proc sql;                       
  create table T3 as select b.ACCOUNT_ID 
  from LOOKUP a, TERALIB.TAB(dbkey=ACCOUNT_ID)  b 
  where a.ACCOUNT_ID = b.ACCOUNT_ID
    and b.MONTH_KEY  = '201712';    *2s;
quit;
/*
2000 queries:  USING ("ACCOUNT_ID" VARCHAR (22))SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB" WHERE "ACCOUNT_ID"=:"ACCOUNT_ID"
NOTE: PROCEDURE SQL used (Total process time):
      real time        02:03:47.07
*/

%****** MULTI_DATASRC_OPT= option;
proc sql;
  create view _V as
  select b.ACCOUNT_ID 
  from LOOKUP a, DBLIB.TAB b 
  where a.ACCOUNT_ID = b.ACCOUNT_ID
    and b.MONTH_KEY  = '201712' 
  using libname DBLIB teradata user=xxx .... multi_datasrc_opt=in_clause;
  create table T4 as select * from _V;
quit;
/*
NOTE: PROCEDURE SQL used (Total process time):
      real time           19.67 seconds
*/

Yes there is a primary index on ACCOUNT_ID:

proc sql; 
  connect using TERALIB;
  select * from connection to TERALIB (
    select count(*)
    from dbc.indices
    where tablename   = 'TAB'
      and columnname  in('MONTH_KEY', 'ACCOUNT_ID')
      and IndexType   in('P', 'Q') 
  );

returns 1.

It's also worth noting that the full download is automatically optimised by SAS to a query containing an in() clause if the LOOKUP table is smaller.

High-Performance SAS Coding - Third Edition

Kurt_Bremser · Posted 05-02-2018 03:30 PM

Assuming

datlib is the library defined for your database
saslib is the library where your SAS dataset resides
dbtable is the table in your database
sastable is your dataset with the unique keys
key is the name of the unique key

:

proc sql;
create table want as
select a.*
from datlib.dbtable a, saslib.sastable b
where a.key = b.key;
quit;

This is the basic blueprint for a SQL inner join.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

In Proc Sql statment, include SAS data to use as a primary key

Re: In Proc Sql statment, include SAS data to use as a primary key

Re: In Proc Sql statment, include SAS data to use as a primary key

Re: In Proc Sql statment, include SAS data to use as a primary key

Re: In Proc Sql statment, include SAS data to use as a primary key

Re: In Proc Sql statment, include SAS data to use as a primary key

In Proc Sql statment, include SAS data to use as a primary key

Re: In Proc Sql statment, include SAS data to use as a primary key

Re: In Proc Sql statment, include SAS data to use as a primary key

Re: In Proc Sql statment, include SAS data to use as a primary key

Re: In Proc Sql statment, include SAS data to use as a primary key

Re: In Proc Sql statment, include SAS data to use as a primary key

Registration is open

SAS Training: Just a Click Away