Hello!
Is it possible to transform this sql statement to be carried out in parallel?
Proc SQL NoPrint;
Connect To ODBC (DSN=sassql);
Create Table Data.a1 As Select * From Connection To ODBC
(
Select KeyDate_YYYYMM, MonthDifference_Sales, Sum(OpenOrder_Value_BillingNotDue) As Value
From vFact_Orderbacklog_SAS
Where
KeyDate_YYYYMM>=200601
AND ScheduleLineCategory_Key not in ('XX', 'YY', 'ZZ')
AND Ultimo_Filter_Reporting =1
AND BusinessUnit_Key =8
Group By KeyDate_YYYYMM, MonthDifference_Sales
);
Create Table DAta.a2 As Select * From Connection To ODBC
(
Select KeyDate_YYYYMM, MonthDifference_Sales, Sum(OpenOrder_Qty_BillingNotDue_t) As Qty
From vFact_Orderbacklog_SAS
Where
KeyDate_YYYYMM>=200601
AND ScheduleLineCategory_Key not in ('XX', 'YY', 'ZZ')
AND Ultimo_Filter_Reporting =1
AND BusinessUnit_Key =8
Group By KeyDate_YYYYMM, MonthDifference_Sales
);
Disconnect From ODBC;
Quit;
Thx&kind regards
Running Parallel SAS jobs, where each job fires off a single query, requires having SAS/CONNECT licensed!
The MP Connect feature of SAS/CONNECT provides the functionality you are looking for.
Check out the following link for Sample template code http://support.sas.com/rnd/scalability/tricks/connect.html#smpmp
Hope this helps,
Ahmed
Yes. For example you could for example run two or more queries at the same time with minor changes to your date selection. this would allow you to read a year or more data in each query. You could then combine the results of each query once they are complete.
Query 1:
Where
KeyDate_YYYYMM>=200601 and KeyDate_YYYYMM <200701
Query 2:
Where
KeyDate_YYYYMM>=200701 and KeyDate_YYYYMM <200801
You're right, this could lead to an improvement, but I was thinking of some kind of multi-threading. The trouble is, that my programs don't work out yet (and I would prefer to stick with proc sql when accessing the SQL-server).
Multi-threading isn't really the issue here as this is not a CPU intensive process. It is IO and probably network bandwidth between SQL Server and SAS that is the bottleneck. I've had good success with the approach I've suggested. I can only suggest you try it for yourself and see what improvements you get. I've done as many as 8 or so parallel queries like this and it has usually turned out to be the network bandwidth being the worst bottleneck. If both servers are in the same server room then it will be very fast but if they are across town or across country.....in the worst scenario I've dealt with the SQL server was in a different country with a 2 or 3 thousand km of undersea fibre-optic cable in between!
Just to add, would executing some SQL on the server to create the final table, then selecting from that be quicker (i.e. processing all on DB, then extract results)?
proc sql;
connect to oracle (...);
/* Create a temp table with all the needed results */
execute (create table TEMP$TEMP_TABLE as select ... from ...) by oracle;
/* Now extract the results to SAS */
create table MY_RESULTS as
select * from connection to oracle (select * from TEMP$TEMP_TABLE);
disconnect from oracle;
quit;
Another thought - your two queries look identical apart from summing a different column in each. Why not combine them?
Sometimes running network transfers in parallel may improve overall performance, because the network infrastructure may limit individual TCP/IP connections to a certain bandwidth. Running transfer processes in parallel may increase the overall network throughput (I often observe this with SSH/SFTP transfers).
But in your case I'd simply compute the sums in one select.
You're right - all of you, the easy fix to make the queries faster is to put "Value" and "Qty" into the same select statement. However, please focus on how to run queries in parallel rather than why 😉 - it's only an example. I am pretty sure that SAS can do that, and I could like to check how much faster this approach would be - but I really could use some syntax help in this.
Running Parallel SAS jobs, where each job fires off a single query, requires having SAS/CONNECT licensed!
The MP Connect feature of SAS/CONNECT provides the functionality you are looking for.
Check out the following link for Sample template code http://support.sas.com/rnd/scalability/tricks/connect.html#smpmp
Hope this helps,
Ahmed
Ok focussing on the how.
You are calculating statistic on some grouped level, the coding however is not explicit pass thru but classic SAS ANSI SQL not very well aligned to all DB specifics.
You are creating SAS tables out of those SQL versions each of them causing the load with the network.
To eliminate the forcing of using SAS-tables you could think of using tempdb and share connections. SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition
It will add a additional copy step from that tempdb to SAS but the processing by the RDBMS can be optimized for those running as much parallel in the RDBMS system.
Option 1
The goal getting it to execute it on the DBMS is using execute SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition.
On some DBMSs, the DBMS-SQL-query argument can be a DBMS-specific SQL EXECUTE statement that executes a DBMS stored procedure. However, if the stored procedure contains more than one query, only the first query is processed. SQL server is supporting multiple variables as an aggregate type: Using Aggregate Functions in the Select List
Remember the process of getting all the data is running parallel on the side of the RDBMS!
Option 2
Going for the more advanced options. DS2 is supporting it all SAS(R) 9.4 DS2 Language Reference, Fourth Edition
It is a different design for accessing data connected to databases and a different way of programming.
All things can run parallel (threading) with FED-SQL SAS(R) 9.4 FedSQL Language Reference, Third Edition
Option 3
Stick with old coding habits and the limitations of proc sql. Run those SAS coding in parallel having them coded not sequential but all as single sas sources.
Choose whatever you can do of: a/ mp-connect - b/ Eguide parallel code - c/ grid processing - d/ batch jobs
The limitations are not the technical ones, but understanding those and getting used to them
Thanks! (Will take some time to work through ..)
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.