BookmarkSubscribeRSS Feed
AJ_Brien
Quartz | Level 8

Hello,

 

I've been running a query on teradata, and my source table is huge- 3-4M records. the code has been running for a while now so I was wondering what can I do to make this faster. I also tried diving my source dataset into 50 groups (around 75k obs per dataset), but the first group query also took forever and I had no other option but to terminate it. Would appreciate any suggestions.

libname TD teradata user="%sysget(USER)" password="&pwd." tdpid=rchtera mode=ansi database=user_TD;

proc sql;
create table TD.table1 as
select *
from check; /*check is my base table*/
quit;

Proc SQL;
CONNECT TO teradata (user="%sysget(USER)" password="&pwd." tdpid=rchtera mode=teradata); 

CREATE TABLE new AS 
Select *
From connection to teradata
	(SELECT abc.data1,
			                CAST(abc.data2 AS CHAR(19))  AS data2
							, abc.data3
				from		user_TD.table1 AS table
				left join 	[teradata table]			abc		on	table.data2 = data2 
				where abc.data1 not in (select data1 from table)
and data2 in (select data2 from table));
DISCONNECT FROM TERADATA;
QUIT;
2 REPLIES 2
SASKiwi
PROC Star

To test whether a slow network is part of your problem change your query like so and compare run times:

 

Replace this: CREATE TABLE new AS 
Select *

With this: select count(*) as Row_Count
ChrisNZ
Tourmaline | Level 20

1. in() clauses can be much slower than joins, depending how the optimiser does its job.

 

2. I am unsure of the purpose of this:

and data2 in (select data2 from table)

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 791 views
  • 0 likes
  • 3 in conversation