10-13-2014 05:37 AM
I had created a code that basically uploads the data from Netezza to SAS and then aggregrates it using the relevant variables using Proc Tabulate. However, this was utilising a lot of resources and time as the dataset size is mostly 50-100 million rows spanning 150 columns.
To, avoid the resource utilization i have been trying to use in-database processing for Netezza, by providing the SQLGENERATION = DBMS option in my libname statement and the using the libname to directly input the Netezza dataset into the proc tabulate procedure.
My understanding of in-database processing is that it should fire a sql query into netezza(when we use the SQLGENERATION = DBMS option in the libname) that performs the tabulation of data accordingly in Netezza and sends the summarized data to SAS. However, this hasn't been working lyk we envisaged, and has been basically firing a select * statement to Netezza and pulling the data into SAS and then performing the aggregation in SAS rather than Netezza.
I have not been able to figure out why this must have been happening and some guidance on the same would be great.
10-13-2014 10:02 AM
Try using PROC SUMMARY to summarize the data. That should be able to push more operations into the data. You could possibly then build a report off of the summarized data using TABULATE if you want.
10-21-2014 05:59 PM
Or try explicit-pass through query. It's nice, because it's explicit. That is, you get to write SQL code in whatever database-specific flavor of SQL, and then say, "please run this SQL step written in your own language on this database and then return the results to me (in SAS)."
10-22-2014 01:34 AM
With in database processing you need to understand the limitation of both sides.
One rule always exists: When it is not possible to do "in-database" processing all data will be copied to SAS and processes there.
An other set of options could be:
sql_ip_trace=source sqlgeneration=dbmust sqlmapputto=SAS_put dbidirectexec ;
The dbmust forcing en error when the sql is not going to the external dbms.
The aggregate functions of a rdbms are not always logical the same as the summary functions of SAS.
This is seen at grouping levels. To eliminate that as cause use a single analyses variable. Check the grouping effects (multi-level) on reading the mauals.
Sometimes SAS has programmed something with advanced techniques using cursors.
Using that with a Analytics database like netezaa / terradata can be a performance disaster. This a failure of SAS.
For that one you have to prove the issue happening and hoping TS will react adequate.
That will leave you to not using SAS but coding tot Netezza native sql as explicit pass throuqh.
It is really not using SAS as SAS has sold the usage for this
10-22-2014 03:21 AM
I would recommend to use SQL passthrough to get the data ordered by the relevant variables in Netezza and then use proc summary ... by ... in SAS.
10-22-2014 07:44 AM
Just see the aggregate functions, these are often unique to each dbms and by that causing troubles getting well translated by SAS.
IBM Knowledge Center (windows aggregate functions netezza)
10-22-2014 04:19 PM
I also had the same issue.
I used readbuff=1000 direct_sql=yes libname options and it worked for me.
You can use SAStrace=',,,ds' SAStraceloc=SASlog to trace where your SQL is getting processed.