Solved: Re: question about writting SAS data back to Teradata, why using TPT n...

Jade_SAS · Posted 08-22-2017 09:26 AM

Hi All,

I have a quick question about writting SAS data back to Teradata.

1) If I use the TPT (Teradata Parallel Transporter) API to load data.

LIBNAME TD TERADATA SERVER=*** DATABASE=***;

Data TD.TEST(TPT=YES FASTLOAD=YES TPT_MAX_SESSIONS=4);

set work.test123;

Run;

The process time is attached in Time1

2) If I use the normal SAS way to load data

LIBNAME TD TERADATA SERVER=*** DATABASE=***;

Data TD.TEST;

set work.test123;

Run;

The process time is attached in Time2.

What puzzles me is why I need more time if loading data via 1), I thought it should be more efficient. Any comments, please!

Thank you!

Jade

TomKari · Posted 08-22-2017 02:39 PM

It's important to understand the context. These options are typically called "bulk loader" options, because when you use them SAS will invoke a special loading utility of the DBMS in question.

I needed to use this option with a different database product a few years ago. If I didn't use the bulk loader, investigations showed that SAS was copying the data by doing a SQL "Insert" for each row. This approach was horrendously slow.

By using the bulk loader option, on the other hand, SAS would create a utility file with the data to be loaded, and then call the database load utility to load the file as a separate process. This worked over 100 times faster than the regular load syntax.

However, when using the regular syntax SAS only needs to create and run a series of simple SQL commands, as opposed to all of the overhead of creating a utility file, and then invoking the DBMS loader. So for small tests, like yours, I'm not surprised at all that the times are reversed.

For a fair test, try it on a data volume that takes at least 20 minutes to load with the regular load mechanism, and then see if you get an improvement with the bulk loader.

Tom

View solution in original post

Kurt_Bremser · Posted 08-22-2017 09:37 AM

If the simple version is already saturating your network, the parallel version will only add overhead.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Jade_SAS · Posted 08-22-2017 09:46 AM

Thank you for the quick reply, KurtBremser! So it means my slow network can not make the parallel version work efficiently? Is there a way to fix?

General speaking, the parallel version should be faster, right?

Jade

Kurt_Bremser · Posted 08-22-2017 09:53 AM

The first step in optimizing is always to get proper knowledge of the problem. So you need to check first (in cooperation with your Teradata admins) where the bottleneck is.

Divide size of the data by time, and see how close that is to the maximum throughput of your network (or simply copy the SAS dataset to a network resource for comparison).

If network is not the culprit, watch the performance of the server(s). Maxed-out CPUs can't be optimized.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Jade_SAS · Posted 08-22-2017 09:57 AM

Thank you so much!

TomKari · Posted 08-22-2017 02:39 PM

It's important to understand the context. These options are typically called "bulk loader" options, because when you use them SAS will invoke a special loading utility of the DBMS in question.

I needed to use this option with a different database product a few years ago. If I didn't use the bulk loader, investigations showed that SAS was copying the data by doing a SQL "Insert" for each row. This approach was horrendously slow.

By using the bulk loader option, on the other hand, SAS would create a utility file with the data to be loaded, and then call the database load utility to load the file as a separate process. This worked over 100 times faster than the regular load syntax.

However, when using the regular syntax SAS only needs to create and run a series of simple SQL commands, as opposed to all of the overhead of creating a utility file, and then invoking the DBMS loader. So for small tests, like yours, I'm not surprised at all that the times are reversed.

For a fair test, try it on a data volume that takes at least 20 minutes to load with the regular load mechanism, and then see if you get an improvement with the bulk loader.

Tom

Jade_SAS · Posted 08-23-2017 01:39 PM

That's make sense. I did try big data set and it's faster with TPT.

How can we dicide what size of data to use TPT or not? Is there a roughly size vable based on your experience? Thank you!

Jade

TomKari · Posted 08-23-2017 07:17 PM

I don't think there's any harm just using it all the time. The overhead it adds is minimal, and it's much easier than worrrying about putting it in and taking it out.

Tom

kiranv_ · Posted 08-22-2017 11:06 PM

first of all I think you should have dbcreate_table_opts= 'primary index(columnof index)'); in your dataset options otherwise you may get skewed table in Teradata if your first column is not unique.

I feel there could be many reasons for the this, first of all workload at that particular point in time and also this run times are so small, if you run the same thing multiple times you may have slightly different times.

previusly i tried to bench mark before by running same query after few minutes, often yielding slightly different results, this can attributed to workload at that point of time. if you really want to benchmark you need to run the tables with lot of data, keeping everything else constant.

One more reasons is fastload works by opening multiple sessions, there are limited amount of sessions per utility which can be pretty much used by other users and you have to wait for your turn.

Jade_SAS · Posted 08-23-2017 01:20 PM

Thank you !

question about writting SAS data back to Teradata, why using TPT needs longer process time?

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

Re: question about writting SAS data back to Teradata, why using TPT needs longer process time?

The 2025 SAS Hackathon has begun!