SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

question about writting SAS data back to Teradata, why using TPT needs longer process time?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 139
Accepted Solution

question about writting SAS data back to Teradata, why using TPT needs longer process time?

Hi All,

    I have a quick question about writting SAS data back to Teradata.

    1) If I use the TPT (Teradata Parallel Transporter) API to load data.

LIBNAME TD TERADATA SERVER=*** DATABASE=***;

Data TD.TEST(TPT=YES FASTLOAD=YES TPT_MAX_SESSIONS=4);

set work.test123;

Run;

The process time is attached in Time1

 

   2) If I use the normal SAS way to load data

LIBNAME TD TERADATA SERVER=*** DATABASE=***;

Data TD.TEST;

set work.test123;

Run;

The process time is attached in Time2.

 

What puzzles me is why I need more time if loading data via 1), I thought it should be more efficient. Any comments, please!

 

Thank you!

Jade


Accepted Solutions
Solution
‎08-23-2017 01:21 PM
PROC Star
Posts: 1,146

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

It's important to understand the context. These options are typically called "bulk loader" options, because when you use them SAS will invoke a special loading utility of the DBMS in question.

 

I needed to use this option with a different database product a few years ago. If I didn't use the bulk loader, investigations showed that SAS was copying the data by doing a SQL "Insert" for each row. This approach was horrendously slow.

 

By using the bulk loader option, on the other hand, SAS would create a utility file with the data to be loaded, and then call the database load utility to load the file as a separate process. This worked over 100 times faster than the regular load syntax.

 

However, when using the regular syntax SAS only needs to create and run a series of simple SQL commands, as opposed to all of the overhead of creating a utility file, and then invoking the DBMS loader. So for small tests, like yours, I'm not surprised at all that the times are reversed.

 

For a fair test, try it on a data volume that takes at least 20 minutes to load with the regular load mechanism, and then see if you get an improvement with the bulk loader.

 

Tom

View solution in original post


All Replies
Super User
Posts: 7,405

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

If the simple version is already saturating your network, the parallel version will only add overhead.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 139

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Thank you for the quick reply, KurtBremser! So it means my slow network can not make the parallel version work efficiently? Is there a way to fix?

General speaking, the parallel version should be faster, right?

 

Jade

 

Super User
Posts: 7,405

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

The first step in optimizing is always to get proper knowledge of the problem. So you need to check first (in cooperation with your Teradata admins) where the bottleneck is.

Divide size of the data by time, and see how close that is to the maximum throughput of your network (or simply copy the SAS dataset to a network resource for comparison).

If network is not the culprit, watch the performance of the server(s). Maxed-out CPUs can't be optimized.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 139

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Thank you so much!

Solution
‎08-23-2017 01:21 PM
PROC Star
Posts: 1,146

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

It's important to understand the context. These options are typically called "bulk loader" options, because when you use them SAS will invoke a special loading utility of the DBMS in question.

 

I needed to use this option with a different database product a few years ago. If I didn't use the bulk loader, investigations showed that SAS was copying the data by doing a SQL "Insert" for each row. This approach was horrendously slow.

 

By using the bulk loader option, on the other hand, SAS would create a utility file with the data to be loaded, and then call the database load utility to load the file as a separate process. This worked over 100 times faster than the regular load syntax.

 

However, when using the regular syntax SAS only needs to create and run a series of simple SQL commands, as opposed to all of the overhead of creating a utility file, and then invoking the DBMS loader. So for small tests, like yours, I'm not surprised at all that the times are reversed.

 

For a fair test, try it on a data volume that takes at least 20 minutes to load with the regular load mechanism, and then see if you get an improvement with the bulk loader.

 

Tom

Frequent Contributor
Posts: 139

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

That's make sense. I did try big data set and it's faster with TPT.

How can we dicide what size of data to use TPT or not? Is there a roughly size vable based on your experience? Thank you!

 

Jade

PROC Star
Posts: 1,146

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

I don't think there's any harm just using it all the time. The overhead it adds is minimal, and it's much easier than worrrying about putting it in and taking it out.

 

Tom

PROC Star
Posts: 279

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

first of all I think you should have dbcreate_table_opts= 'primary index(columnof index)'); in your dataset options otherwise you may get skewed table in Teradata if your first column is not unique.

I feel there could be many reasons for the this, first of all workload at that particular point in time and also this run times are so small, if you run the same thing multiple times you may have slightly different times.

previusly i tried to bench mark before by running same query after few minutes, often yielding slightly  different results, this can attributed to workload at that point of time. if you really want to benchmark you need to run the tables with lot of data, keeping everything else constant.

One more reasons is fastload works by opening multiple sessions, there are limited amount of sessions per utility which can be pretty much used by other users and you have to wait for your turn.

 

Frequent Contributor
Posts: 139

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Thank you !

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 412 views
  • 3 likes
  • 4 in conversation