04-13-2017 08:03 PM
I have a SAS Dataset with 5 million records and a teradata table with 400 million records for left join on one key variable in PROC SQL. I need 7 variables to keep from SAS Dataset and 2 two from teradata table. I am trying for more efficient way to joins these tables.
Environment: SAS EG 7.1, SAS 9.4 Grid on linux.
Note: No indexes on the teradata table or not permitted to create
06-08-2017 08:59 AM - edited 06-08-2017 09:01 AM
Your results closely match what I experienced yesterday. I created a 2M row data set and it took 1:57 (one hour and 57 minutes) to move it into a VOLATILE table. I backed off the size to 50K rows (2 columns - $10 and $3).
The first test I ran was a simple INSERT (no MULTISMT=YES) and it took slightly over 4 minutes to insert the 50k rows. Next, I specified TPT=NO MULTISTMT=YES on the target table. The 50K rows were INSERTed in just under 3 minutes. Which isn't great but it is ~33% faster than doing nothing. We can't use TPT=YES because it opens its own connection and cannot see the VOLATILE table.
I used TPT=YES FASTLOAD=YES and was able to load the 2M row table in under 7 seconds. I think it is clear that you really want to use FASTLOAD.
Given your situation, I would describe the problem to your DBA. They could create an empty table for you and allow you to FASTLOAD into it, and delete the contents. If this approach is taken be sure to check out the link, below, on DELETEing data fast. They may allow you to create a table in a playpen. Either of these approached would be greatly preferred to using a VOLATILE table.
When dealing with your DBA express concern that by not being able to create a table you will be using more Teradata resources than required. Point out that if you aren't able to move the small set of data to the server then SAS may unnecessarily read large amounts of data from Teradata into SAS (SAS will perform the join when it is best performed by Teradata).
When I was in SAS consulting I dealt with this type of thing quite often. If you explain it as "I am trying to make efficient use of database resources" you have a good chance of having them help you.
So, ask for permission to create the table in a playpen and if they say "no" then ask them to create the table for you. Either way it is better than the VOLATILE table strategy.
Once you have permission to create your table you will want to ensure that it is indexed proper and has up-to-date statistics. This will help make the join processing as efficient as possible. It may be fine to load the table, then take statistics one time. This could work because your table will be similar each time it is loaded because the data is deleted prior to each new load. I think I discuss statistics in the Troubleshooting SAS and Teradata Query Performance Problems paper.
You may find these helpful:
DBIDIRECTEXEC: GOFAST=YES for Database Processing (SAS Communities Article)
DELETE Your Teradata Data Fast! (This will help if the DBA creates a table for you)
04-13-2017 08:29 PM
Please check this link answered by SAS employee may be helpful.
04-14-2017 05:39 AM
04-25-2017 11:15 AM
A straight forward approach that should work would be to:
1. First Bulk Load the SAS dataset to a Teradata table
2. join the two Teradata tables (could use an implicit join with SAS PROC SQL, or an explicit SQL passthrough join).
3. Then, if necessary, pull the resulting join back down to SAS as a SAS dataset.
06-01-2017 09:47 PM
I tried to insert data into volatile table in teradata as mentioned by @kiranv_ and then by using explicit sql-passthrough to get my data into sas. The issue here is I have 1 million SAS data to be inserted into volatile table, this is taking long time. I belive we cannot use FASTLOAD or MULTILOAD if we are inserting into volatile table. I used TPT and MULTISTMT, but my problem is not solved. Is there a way I can inprove the performance.
Note: I only can create volatile tables in Teradata.
LIBNAME TD TERADATA SERVER=ABCD USER=XXXX PASSWORD=XXXXXX CONNECTION=GLOBAL DBMSTEMP=YES; PROC DELETE DATA=TD.temp1; RUN; proc sql; connect to teradata(SERVER=ABCD USER=XXXX PASSWORD=XXXXXX CONNECTION=GLOBAL); execute (CREATE VOLATILE TABLE temp1(NAME CHAR(10), STATE CHAR(3)) ON COMMIT PRESERVE ROWS) by teradata; execute ( COMMIT WORK ) by teradata; quit; options sastrace=',,,d' sastraceloc=saslog; proc append base=TD.temp1(TPT=YES multistmt=yes DBCOMMIT=10000) data=WORK.HOME FORCE ; run;
06-01-2017 10:31 PM - edited 06-01-2017 11:35 PM
this could be due to primary index. Here name is your primary index. If you do not explicitly mention primary index, then your first column will become primary index. Is your name column unique, if not, it will leading to skewing , which in turn will effect your performance. If I were you, I will do few changes.
1. I will create your volatile table as multiset table with no primary index. (no PI is best for staging tables as data can be inserted fastly because data can be loaded into this tables randomly). Multiset table will not check for duplicate records further enhancing performance. If you do not mention set or multiset table in ANSI mode, set tables are created, Set tables will check for duplicates )
2. I will name as varchar(10) instead of char(10)
please check the code below.
proc sql; connect to teradata(SERVER=ABCD USER=XXXX PASSWORD=XXXXXX CONNECTION=GLOBAL); execute (CREATE multiset VOLATILE TABLE temp1(NAME VARCHAR(10), STATE CHAR(3)) NO PRIMARY INDEX ON COMMIT PRESERVE ROWS) by teradata; execute ( COMMIT WORK ) by teradata; quit;
06-05-2017 01:10 PM
I did some bechmarking in our own environment and what I found in our environment was 1 million records records was taking approximately 2 minutes(without fastload).this table has around 150 columns.
I do not know how much spool space is available for you, whether that has any impact on this. If possible, please give us some info like how many columns are there in your table. IS this only process, which takes time.
if possible, Try loading your table when the workload is very less and see whether that does have any impact. I would ask @JBailey , who is an expert and wrote quite few articles on moving data from SAS into teradata.
one of best articles of @JBailey on moving data from SAS to teradata is given below, which could be of some help.
06-06-2017 08:00 AM
You are way too kind!
I really need to take some time and update the doc you referenced, but the content is mostly relevant.
06-05-2017 06:23 PM
Make sure you also set DBCOMMIT=0
2 hours for 1M records is seriously slow. Is reading data from Teradata also that slow?
06-06-2017 07:58 AM
It has been a while since I have played around with this. I will take some time and see what I can come up with. I will try to get back here by Friday.
Does the DDL for temp1 reflect what you are actually loading into the database?
In the meantime: One suggestion, discuss this with your DBA and see if they will create a permanent table for you. They can let you load it and delete all the rows but not drop it. This will could allow you to FASTLOAD into it and then work from there.
I will let you know what I find.
Need further help from the community? Please ask a new question.