10-05-2017 12:59 PM
Appreciate if someone help me with the following situation. I've a Table A in SAS which has some 2 million rows with 25 variables and
I've a other table B in Netezza which 10 millions of records with 42 variables. Now I want to join these two tables and I'm not sure whether I should use Implicit or explicit pass through for better performance. In either method, I would like to know the basic structure of SAS code to achieve this. I've done joins using only SAS datasets or only non SAS datasets but not with both.
10-05-2017 01:14 PM
You cannot use explicit pass through because that works on the server and you don't have the SAS data set on the server.
If you can load your table to the server, that's a different story.
Since you're using IMPLICIT queries, you can treat them both as 'SAS tables' and your code isn't any different.
10-05-2017 01:16 PM
Well, sending the 2mil rows to the database should be quicker than taking 10mil rows from the database, but by number alone. However its not that simple, there is the hardware available to each system for instance. That is quite a few records, you may want to test with a subset and get code working before running it in total. You may want to consult with your Netezza person.
As for code, its pretty similar either way,
proc sql; connect to net (...); select * from connection to net ( select ....); /* or to send to db*/ execute by net(create table yourworkspace as select * from saslib.datas); disconnect from net; quit; /* Or you could setup libnames and make the SQL a bit smaller.*/
Many examples out there on how to go one way or the other, but which would be better is hard to say and also depends a lot on what you want to do further with it, for instance if you were creating a SAS report, might make more sense to have all the data in SAS.
10-05-2017 03:33 PM
Also consider exporting to a flat file, copy that to the database server, read it into a table there, and do the join in the database. Just as one method to compare, performancewise.
10-06-2017 02:20 AM