Hello,
I've been running a query on teradata, and my source table is huge- 3-4M records. the code has been running for a while now so I was wondering what can I do to make this faster. I also tried diving my source dataset into 50 groups (around 75k obs per dataset), but the first group query also took forever and I had no other option but to terminate it. Would appreciate any suggestions.
libname TD teradata user="%sysget(USER)" password="&pwd." tdpid=rchtera mode=ansi database=user_TD;
proc sql;
create table TD.table1 as
select *
from check; /*check is my base table*/
quit;
Proc SQL;
CONNECT TO teradata (user="%sysget(USER)" password="&pwd." tdpid=rchtera mode=teradata);
CREATE TABLE new AS
Select *
From connection to teradata
(SELECT abc.data1,
CAST(abc.data2 AS CHAR(19)) AS data2
, abc.data3
from user_TD.table1 AS table
left join [teradata table] abc on table.data2 = data2
where abc.data1 not in (select data1 from table)
and data2 in (select data2 from table));
DISCONNECT FROM TERADATA;
QUIT;
To test whether a slow network is part of your problem change your query like so and compare run times:
Replace this: CREATE TABLE new AS
Select *
With this: select count(*) as Row_Count
1. in() clauses can be much slower than joins, depending how the optimiser does its job.
2. I am unsure of the purpose of this:
and data2 in (select data2 from table)
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.