Hello,
I've been running a query on teradata, and my source table is huge- 3-4M records. the code has been running for a while now so I was wondering what can I do to make this faster. I also tried diving my source dataset into 50 groups (around 75k obs per dataset), but the first group query also took forever and I had no other option but to terminate it. Would appreciate any suggestions.
libname TD teradata user="%sysget(USER)" password="&pwd." tdpid=rchtera mode=ansi database=user_TD;
proc sql;
create table TD.table1 as
select *
from check; /*check is my base table*/
quit;
Proc SQL;
CONNECT TO teradata (user="%sysget(USER)" password="&pwd." tdpid=rchtera mode=teradata);
CREATE TABLE new AS
Select *
From connection to teradata
(SELECT abc.data1,
CAST(abc.data2 AS CHAR(19)) AS data2
, abc.data3
from user_TD.table1 AS table
left join [teradata table] abc on table.data2 = data2
where abc.data1 not in (select data1 from table)
and data2 in (select data2 from table));
DISCONNECT FROM TERADATA;
QUIT;
To test whether a slow network is part of your problem change your query like so and compare run times:
Replace this: CREATE TABLE new AS
Select *
With this: select count(*) as Row_Count
1. in() clauses can be much slower than joins, depending how the optimiser does its job.
2. I am unsure of the purpose of this:
and data2 in (select data2 from table)
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.