BookmarkSubscribeRSS Feed
JJP1
Pyrite | Level 9

Hi All,

SAS DI job is taking more time to do the inner join between SAS and external table.(After inner join there is table loader creates final table in SAS job)

so i tried of using pass through option as yes.but this option is not enabled..to increase the performance.

Would you please suggest any other options. please have a look on sample joining code of the DI job.

proc sql;
   create view work.GTGYHH as
   select
      EXTERNAL.COL1 length = 8   
         format = $8.
         informat = $8.
         label = 'COL1',
      EXTERNAL.COL2 length = 8   
         format = 6.
         informat = 6.
         label = 'COL2',
      EXTERNAL.COL3 length = 8   
         format = 6.
         informat = 6.
         label = 'COL3',
      SAS_TABLE.COL4 length = 8   
         format = 6.
         informat = 6.
         label = 'COL4',
      SAS_TABLE.COL5 length = 60   
         format = $60.
         informat = $60.
         label = 'COL5',
      SAS_TABLE.COL6 length = 8   
         
         label = 'COL6',
      SAS_TABLE.COL7 length = 8   
         
        label = 'COL7'
   from
      EXTERNAL, 
      SAS_TABLE
   where
      SAS_TABLE.JJJJJ = EXTERNAL.JJJJJ
      and SAS_TABLE.JJJJJ <> .
      and SAS_TABLE.IIIII = "AAA"
   ;
quit;

 

 

4 REPLIES 4
s_lassen
Meteorite | Level 14

Whether you can use pass-through is not really important: If the queries are relatively simple, SAS automatically converts to native SQL.

 

But joins between two different databases (in this case, SAS and DB2) are often slow. What happens is probably that SAS reads a record and then makes a query to get the matching values in the DB2 table. 

 

The normal way to speed this up is to read the values from DB2 first, using a dynamically generated IN clause. Something like this:

proc sql noprint;
  select distinct quote(trim(JJJJJ),"'") into :invalues
  from SAS_TABLE
  where JJJJJ is not null and IIIII='AAA';
  create table extract as select
    JJJJJ,
    COL1 length = 8   
         format = $8.
         informat = $8.
         label = 'COL1',
      COL2 length = 8   
         format = 6.
         informat = 6.
         label = 'COL2',
     COL3 length = 8   
         format = 6.
         informat = 6.
         label = 'COL3'
from EXTERNAL where JJJJJ in(&inclause); proc sql; create table work.GTGYHH as select EXTERNAL.COL1 , EXTERNAL.COL2, EXTERNAL.COL3, SAS_TABLE.COL4 length = 8 format = 6. informat = 6. label = 'COL4', SAS_TABLE.COL5 length = 60 format = $60. informat = $60. label = 'COL5', SAS_TABLE.COL6 length = 8 label = 'COL6', SAS_TABLE.COL7 length = 8 label = 'COL7' from Extract EXTERNAL, SAS_TABLE where SAS_TABLE.JJJJJ = EXTERNAL.JJJJJ and SAS_TABLE.JJJJJ <> . and SAS_TABLE.IIIII = "AAA" ; quit;

There are a couple of limitations to this specific technique: SAS macro variables cannot be longer than 64K, and there is also a limitation to how many values DB2 will accept in an IN clause. In the "good" old days, that was only one thousand values, but the bar has been raised since then; I am not sure how high. Check how many values were read from DB2 in the SELECT INTO statement (look at macro variable SQLOBS), and check the length of the generated macro variable INVALUES.

 

If you hit one of those limitations, it is probably easier to split the code by writing a temporary SAS program and %INCLUDE that, e.g.:

filename tempsas temp;
proc sql;
  create table invalues as select distinct JJJJJ from SAS_TABLE;
quit;

data _null_;
  set invalues end=done;
  file tempsas temp;
  if _N_=1 then 
    put 'set';
  select(mod(_N_,2000));
    when(1)
      put 'EXTERNAL(keep=JJJJJ COL1 COL2 COL3 where=(JJJJJ in(' "'" JJJJJ +(-1) "'";
    when(0) do;
      put "'" JJJJJ +(-1) "')))";
      if done then do;
        put ';';
        stop;
        end;
      end;
    otherwise
      put "'" JJJJJ +(-1) "'";
    end;
  if done then put ')));';
run;

data extract;
  %include tempsas;
  /* put label and format statements here */
run;
 

And then continue with the last part of my previous example.

ChrisNZ
Tourmaline | Level 20

You must choose where the query runs.

 

If you join SAS data and DB2 data in a SAS query, SAS had no option but to bring the DB2 data to SAS, and execute the join in SAS.

To speed this up, you must reduce the amount of DB2 data transferred.

 

To achieve this, you can either upload the SAS keys into a DB2 temporary table, and run a table look-up there, if you are allowed to upload data, or upload the keys as part of the query, typically in an IN statement. Depending on the number of values to pass, different methods can be used to build that IN clause. A macro variable, as shown by @s_lassen, is the simplest method, if the length of the IN clause is below 64k.

JJP1
Pyrite | Level 9

Thanks @s_lassen and @ChrisNZ for new option

Actually i have no facility of using key option for DB2 source.

 

So i just created a SAS view from extrenal table and then performed the join between two SAS datasets (join output again view) which will increase the performance. and finally table loader to load the data into target

ChrisNZ
Tourmaline | Level 20
The only way to speed up the query is to transfer less DB2 data. Unless your view does that I don't see how it can help.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 823 views
  • 0 likes
  • 3 in conversation