Desktop productivity for business analysts and programmers

Teradata and SAS

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 81
Accepted Solution

Teradata and SAS

 

Hi,

Teradata is faster than SAS and I have been told that it would be better to push queries to Teradata and
run them there.

 

I have a SAS dataset and a large teradata table(8 million rows) and need to join them. I use a libname statement to

connect to Teradata:
Libname mytera teradata user='xxxx' pass='xxxxx' tdpid=abc schema=abc_efg ;

 

I'm a little confused,could you give me a code example that

How it would be if I want to join tables in SAS?

How it would be if I want to join them in Teradata?

 

Thanks.


Accepted Solutions
Solution
‎10-21-2016 03:54 AM
Respected Advisor
Posts: 3,841

Re: Teradata and SAS

Where processing should happen really depends on the concrete case. What you normally want to minimize with heterogenous joins (=combining data from different data sources) is the movement of data between servers.

 

Case 1: Big Teradata table, small SAS table, result needed in SAS only a small subset of rows in Teradata

Here you certainly want to push all the processing to Teradata so you need to somehow upload the SAS data into Teradata for the join.

If you only need key columns from the SAS side to sub-set rows in Teradata then "dbkey" is great; else you probably need to go for a temporary table in Teradata (if allowed to do so).

 

Case 2: Big Teradata table, small SAS table, result needed in SAS are almost all rows from Teradata table

Here processing can happen in SAS; eventually using a data step where you load your small SAS table into a hash and then do the lookup this way. Use Keep/Drop to only load the required columns from Teradata to SAS. There are also options like "readbuff" which you might want to alter from their default settings.

 

There is no "one fits all" approach but you need to acquire a bit a deeper understanding of how things work to come up with the right solution on a case by case basis. Below a good starting point.

Capture.PNG

http://support.sas.com/documentation/cdl/en/acreldb/69039/HTML/default/viewer.htm#n1v1cfazem7dejn1xi...

 

View solution in original post


All Replies
Esteemed Advisor
Posts: 5,202

Re: Teradata and SAS

"Faster" is a relative word.

Given the same hardware, I would say that SAS is generally faster, I've worked with a Teradata installation that was much slower that the ordinary SAS 4 core Windows server.

But Teradata can deploy quite large MPP setups, so very large datasets, yes faster because of more iron.

 

To join in SAS, just issue a libname to your TS schema, Be aware of that the whole TD table will be transferred to the SAS session (unless you have some TD table specific where filtering).

 

To join in TD, typically you need to upload your SAs table to TD, and then you could use SQL pass-through, or if you got the settings right, perhaps you could even use implicit SQL.

 

If your SAS table is relatively small, take a look at the DBKEY= data set option.

Data never sleeps
Frequent Contributor
Posts: 81

Re: Teradata and SAS

Can you please give me an example?

 

Thanks.

Solution
‎10-21-2016 03:54 AM
Respected Advisor
Posts: 3,841

Re: Teradata and SAS

Where processing should happen really depends on the concrete case. What you normally want to minimize with heterogenous joins (=combining data from different data sources) is the movement of data between servers.

 

Case 1: Big Teradata table, small SAS table, result needed in SAS only a small subset of rows in Teradata

Here you certainly want to push all the processing to Teradata so you need to somehow upload the SAS data into Teradata for the join.

If you only need key columns from the SAS side to sub-set rows in Teradata then "dbkey" is great; else you probably need to go for a temporary table in Teradata (if allowed to do so).

 

Case 2: Big Teradata table, small SAS table, result needed in SAS are almost all rows from Teradata table

Here processing can happen in SAS; eventually using a data step where you load your small SAS table into a hash and then do the lookup this way. Use Keep/Drop to only load the required columns from Teradata to SAS. There are also options like "readbuff" which you might want to alter from their default settings.

 

There is no "one fits all" approach but you need to acquire a bit a deeper understanding of how things work to come up with the right solution on a case by case basis. Below a good starting point.

Capture.PNG

http://support.sas.com/documentation/cdl/en/acreldb/69039/HTML/default/viewer.htm#n1v1cfazem7dejn1xi...

 

Respected Advisor
Posts: 3,069

Re: Teradata and SAS

What is the joining logic between the Teradata table and the SAS one? How many rows are in your SAS table?

 

Both tables need to be in the same environment to be joined. If the SAS table is the smaller one then loading that into Teradata as a temporary table is a good approach. You can then run an SQL passthru query in Teradata to do the join. 

Valued Guide
Posts: 2,111

Re: Teradata and SAS

[ Edited ]

Doriana,

 

As SASKiwi said, both tables need to be in the same environment to do the join.  A lot of times, analyst do not have write access to the main database.  In that case, we have to do the join in SAS.  There are a couple of strategies that I have used.

1) do the SQL join referenceing the two tables and use the SQL code to minimize the columns and rows transfered from TD (using TD passthru subqueries if possible).

2) Use SQL passthru to being in the smallest set of TD data that you can into a local SAS dataset without the SAS table criteria (e.g. subset on 'year' and specific columns).  Then use SQL to join resulting SAS datasets.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 417 views
  • 0 likes
  • 5 in conversation