Hello,
I've been running a query on teradata, and my source table is huge- 3-4M records. the code has been running for a while now so I was wondering what can I do to make this faster. I also tried diving my source dataset into 50 groups (around 75k obs per dataset), but the first group query also took forever and I had no other option but to terminate it. Would appreciate any suggestions.
libname TD teradata user="%sysget(USER)" password="&pwd." tdpid=rchtera mode=ansi database=user_TD;
proc sql;
create table TD.table1 as
select *
from check; /*check is my base table*/
quit;
Proc SQL;
CONNECT TO teradata (user="%sysget(USER)" password="&pwd." tdpid=rchtera mode=teradata);
CREATE TABLE new AS
Select *
From connection to teradata
(SELECT abc.data1,
CAST(abc.data2 AS CHAR(19)) AS data2
, abc.data3
from user_TD.table1 AS table
left join [teradata table] abc on table.data2 = data2
where abc.data1 not in (select data1 from table)
and data2 in (select data2 from table));
DISCONNECT FROM TERADATA;
QUIT;
To test whether a slow network is part of your problem change your query like so and compare run times:
Replace this: CREATE TABLE new AS
Select *
With this: select count(*) as Row_Count
1. in() clauses can be much slower than joins, depending how the optimiser does its job.
2. I am unsure of the purpose of this:
and data2 in (select data2 from table)
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.