Hi,
We have migration on going from Netezza to Vertica. I'm using SAS DIS and we have converted all the Netezza tables to Vertica tables and most of it works fine but I have two problems which I need to figure out. We are using SAS/ACCESS to Vertica.
Some jobs have bigger datasets to upload to database and those had BULKLOAD = YES with Netezza. As I understood, Vertica doesn't support BULKLOAD so I had to remove that. I also changed Table Loader (Load style: Append to Existing) Technique from 'proc append' to 'insert sql' because with append I got errors:
WARNING: Variable XX has different lengths on BASE and DATA files (BASE 2560 DATA 40)
from all the character columns.
Problem 1 is that when I had to remove BULKLOAD = YES, execution times increased from few minutes to many hours. Is there any 'bulkload' option which works with Vertica?
And problem 2 may be because of problem 1 (?). After job has executed 8 hours, I get error:
ERROR: Error attempting to CREATE a DBMS table. ERROR: CLI execute error: ROLLBACK 4213: Object "table_name" already exists.
I can't repeat that error with limited datasets so this occurs only after long runs. I also checked that all the work table names match in log so it's not (shouldn't be) because of table references. Table is not empty before this append and it should not try to create anything.
I am not familiar with Vertica, but I can see from the documentation that there is no bulk load facility. One other way to speed things up when writing to a DBMS table is to set the libname or dataset option INSERTBUFF to a high value. You probably want to experiment with the number, according to the documentation the default value depends on the row length, so the optimal value may depend on the table being inserted.
What happens with PROC APPEND? Are all the character columns in your SAS input really 2560 characters long? And the length of the corresponding values in Vertica only 40 characters? You should probably check if the conversion from Netezza to Vertica went well, then.
I am not familiar with Vertica, but I can see from the documentation that there is no bulk load facility. One other way to speed things up when writing to a DBMS table is to set the libname or dataset option INSERTBUFF to a high value. You probably want to experiment with the number, according to the documentation the default value depends on the row length, so the optimal value may depend on the table being inserted.
What happens with PROC APPEND? Are all the character columns in your SAS input really 2560 characters long? And the length of the corresponding values in Vertica only 40 characters? You should probably check if the conversion from Netezza to Vertica went well, then.
Oh, sorry, I misread your original post. When the BASE variable is LONGER than the DATA variable, there is really no problem with append. You can just use the FORCE option in APPEND, and it should work. You will still get the warnings, though.
As I said, I have no experience with Vertica. But if I got similar messages from another DBMS (e.g. SQL Server), I would suspect that character variables had not been optimally created in the conversion - that what used to be standard character variables (e.g. VARCHAR(40)) had been converted to some sort of long text variables in the new database. And these may work in some cases, but they may slow down performance in other cases, and sometimes not work at all for your purposes. Depends on the DBMS. But maybe Vertica has only long text fields?
The numbers in the warnings may have changed because the DBMAX_TEXT option for the Vertica libname has been set to a new value.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.