BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tom
Super User Tom
Super User

Note that in TERADATA the PRIMARY INDEX is used to determine how to distribute the data across the compute nodes.  You want a primary index that is balanced so that approximately the same number of observations will be processed by each compute node.  Otherwise you cannot take advantage of the massive parallel computing power of Teradata engine. 

 

Where in the process is it "timing out"?

Is it when running the query in Teradata?

If so then ask your Teradata admin team to help you optimize the query.

 

Is it when trying to save the query back to TERADATA?  Make sure that you are not making a skewed table in TERADATA by using an appropriate PRIMARY INDEX.

 

Is it in trying to download the data to SAS?  How much data are you downloading?  Do you need to download so much data? Could you instead summarize it in Teradata and only return to SAS the summary (aka smaller) data?

SAS_in_Florida
Fluorite | Level 6
The data transfer into Teradata completes without error. The timeout happens during the inner join. I will look into finding an appropriate Teradata admin to get their input. Thank you for the suggestion.
Patrick
Opal | Level 21

@SAS_in_Florida wrote:
The data transfer into Teradata completes without error. The timeout happens during the inner join. I will look into finding an appropriate Teradata admin to get their input. Thank you for the suggestion.

@SAS_in_Florida  It's certainly a good idea to have a Teradata DBA look into the reason of why the inner join takes so long. 

 

One strategy for improving performance that worked for me in the past for such a heterogenous inner join situation was to generate SQL with a where clause instead of an inner join. For many databases the limit for a single SQL is 32kb so you need to generate and send multiple queries.

 

Here some sample code to illustrate the approach.

data tera_big;
  val='1';
  do key=1 to 1000000;
    do j=1 to 100;
      output;
    end;
  end;
run;

data sas_small;
  do key=1 to 1000000 by 100;
    output;
  end;
run;

%macro sql_pump(keylist);
  proc sql;
    create table work.result_set as
    select key,val
    from tera_big
    where key in (%unquote(&keylist))
    ;
  quit;
  proc datasets lib=work nolist;
    append data=work.result_set base=work.want;
    run;
    delete result_set;
    run;
  quit;
%mend;

data _null_;
  set sas_small end=last;
  by key;
  if last.key;

  length keylist $32000;
  retain keylist;
  keylist=catx(',',keylist,put(key,f16. -l));

  if _n_=1 then
    do;
      call execute('proc datasets lib=work nolist nowarn; delete want; quit;');
    end;
  if last or length(keylist)>31000 then
    do;
      call execute( cats('%sql_pump(%nrstr(',keylist,'));') );
      call missing(keylist);
      output;
    end;
run;

 

SAS_in_Florida
Fluorite | Level 6
I like this and I was initially trying something similar to send 1000 examples through Teradata at a time using a where statement on the key. Was getting some sort of Teradata. I initially was thinking that my data being improperly defined as Num/Char was the culprit but I spent too much time on it and essentially started going down the rabbit hole I am in now. I will look through you example as this is a good approach as well. Thank you for taking to time to share this.
Patrick
Opal | Level 21

The number of SQLs you send needs to be relatively small for this to potentially outperform the temp table approach. It should be o.k. for 10K keys.

If it's for character then just also add single quotes around the values.

keylist=catx(',',keylist,cats("'",key,"'"));
Reeza
Super User
For 110,000 records why not bring the table to SAS and process locally? That isn't a big table at all. And then write back whatever results at the end?
SAS_in_Florida
Fluorite | Level 6
I am joining the 110k records with a table with Millions. The matches are then brought back to SAS to then do additional processing.
Reeza
Super User

And I assume this method generates a string too long to submit to the server?

 

https://gist.github.com/statgeek/dd57887f1d34f3dc6476471d9ac035ed#file-sas_dynamically_generate_filt...

 


%let age = 14;

data test;
set sashelp.class end = eof;;
where age = &age.;

if _n_ =1 then do;
call execute('proc sql; select * from sashelp.class where name in (');
end;

if not eof then do;
call execute(quote(name));
call execute(",");
end;

if eof then do;
call execute(quote(name));
call execute("); quit;");
end;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 969 views
  • 16 likes
  • 7 in conversation