Solved: Run queries in parallel

user24feb · Posted 04-13-2015 03:24 AM

Hello!

Is it possible to transform this sql statement to be carried out in parallel?

Proc SQL NoPrint;

Connect To ODBC (DSN=sassql);

Create Table Data.a1 As Select * From Connection To ODBC

(

Select KeyDate_YYYYMM, MonthDifference_Sales, Sum(OpenOrder_Value_BillingNotDue) As Value

From vFact_Orderbacklog_SAS

Where

KeyDate_YYYYMM>=200601

AND ScheduleLineCategory_Key not in ('XX', 'YY', 'ZZ')

AND Ultimo_Filter_Reporting =1

AND BusinessUnit_Key =8

Group By KeyDate_YYYYMM, MonthDifference_Sales

);

Create Table DAta.a2 As Select * From Connection To ODBC

(

Select KeyDate_YYYYMM, MonthDifference_Sales, Sum(OpenOrder_Qty_BillingNotDue_t) As Qty

From vFact_Orderbacklog_SAS

Where

KeyDate_YYYYMM>=200601

AND ScheduleLineCategory_Key not in ('XX', 'YY', 'ZZ')

AND Ultimo_Filter_Reporting =1

AND BusinessUnit_Key =8

Group By KeyDate_YYYYMM, MonthDifference_Sales

);

Disconnect From ODBC;

Quit;

Thx&kind regards

AhmedAl_Attar · Posted 04-13-2015 06:21 AM

Running Parallel SAS jobs, where each job fires off a single query, requires having SAS/CONNECT licensed!

The MP Connect feature of SAS/CONNECT provides the functionality you are looking for.

Check out the following link for Sample template code http://support.sas.com/rnd/scalability/tricks/connect.html#smpmp

Hope this helps,

Ahmed

View solution in original post

SASKiwi · Posted 04-13-2015 03:40 AM

Yes. For example you could for example run two or more queries at the same time with minor changes to your date selection. this would allow you to read a year or more data in each query. You could then combine the results of each query once they are complete.

Query 1:

Where

KeyDate_YYYYMM>=200601 and KeyDate_YYYYMM <200701

Query 2:

Where

KeyDate_YYYYMM>=200701 and KeyDate_YYYYMM <200801

user24feb · Posted 04-13-2015 03:56 AM

You're right, this could lead to an improvement, but I was thinking of some kind of multi-threading. The trouble is, that my programs don't work out yet (and I would prefer to stick with proc sql when accessing the SQL-server).

SASKiwi · Posted 04-13-2015 04:21 AM

Multi-threading isn't really the issue here as this is not a CPU intensive process. It is IO and probably network bandwidth between SQL Server and SAS that is the bottleneck. I've had good success with the approach I've suggested. I can only suggest you try it for yourself and see what improvements you get. I've done as many as 8 or so parallel queries like this and it has usually turned out to be the network bandwidth being the worst bottleneck. If both servers are in the same server room then it will be very fast but if they are across town or across country.....in the worst scenario I've dealt with the SQL server was in a different country with a 2 or 3 thousand km of undersea fibre-optic cable in between!

RW9 · Posted 04-13-2015 04:36 AM

Just to add, would executing some SQL on the server to create the final table, then selecting from that be quicker (i.e. processing all on DB, then extract results)?

proc sql;

connect to oracle (...);

/* Create a temp table with all the needed results */

execute (create table TEMP$TEMP_TABLE as select ... from ...) by oracle;

/* Now extract the results to SAS */

create table MY_RESULTS as

select * from connection to oracle (select * from TEMP$TEMP_TABLE);

disconnect from oracle;

quit;

SASKiwi · Posted 04-13-2015 04:43 AM

Another thought - your two queries look identical apart from summing a different column in each. Why not combine them?

Kurt_Bremser · Posted 04-13-2015 05:04 AM

Sometimes running network transfers in parallel may improve overall performance, because the network infrastructure may limit individual TCP/IP connections to a certain bandwidth. Running transfer processes in parallel may increase the overall network throughput (I often observe this with SSH/SFTP transfers).

But in your case I'd simply compute the sums in one select.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

user24feb · Posted 04-13-2015 05:27 AM

You're right - all of you, the easy fix to make the queries faster is to put "Value" and "Qty" into the same select statement. However, please focus on how to run queries in parallel rather than why 😉 - it's only an example. I am pretty sure that SAS can do that, and I could like to check how much faster this approach would be - but I really could use some syntax help in this.

AhmedAl_Attar · Posted 04-13-2015 06:21 AM

Running Parallel SAS jobs, where each job fires off a single query, requires having SAS/CONNECT licensed!

The MP Connect feature of SAS/CONNECT provides the functionality you are looking for.

Check out the following link for Sample template code http://support.sas.com/rnd/scalability/tricks/connect.html#smpmp

Hope this helps,

Ahmed

jakarman · Posted 04-13-2015 06:55 AM

Ok focussing on the how.
You are calculating statistic on some grouped level, the coding however is not explicit pass thru but classic SAS ANSI SQL not very well aligned to all DB specifics.
You are creating SAS tables out of those SQL versions each of them causing the load with the network.

To eliminate the forcing of using SAS-tables you could think of using tempdb and share connections. SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition

It will add a additional copy step from that tempdb to SAS but the processing by the RDBMS can be optimized for those running as much parallel in the RDBMS system.

Option 1

The goal getting it to execute it on the DBMS is using execute SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition.

On some DBMSs, the DBMS-SQL-query argument can be a DBMS-specific SQL EXECUTE statement that executes a DBMS stored procedure. However, if the stored procedure contains more than one query, only the first query is processed. SQL server is supporting multiple variables as an aggregate type: Using Aggregate Functions in the Select List

Remember the process of getting all the data is running parallel on the side of the RDBMS!

Option 2

Going for the more advanced options. DS2 is supporting it all SAS(R) 9.4 DS2 Language Reference, Fourth Edition

It is a different design for accessing data connected to databases and a different way of programming.

All things can run parallel (threading) with FED-SQL SAS(R) 9.4 FedSQL Language Reference, Third Edition

Option 3

Stick with old coding habits and the limitations of proc sql. Run those SAS coding in parallel having them coded not sequential but all as single sas sources.

Choose whatever you can do of: a/ mp-connect - b/ Eguide parallel code - c/ grid processing - d/ batch jobs

The limitations are not the technical ones, but understanding those and getting used to them

---->-- ja karman --<-----

user24feb · Posted 04-13-2015 07:42 AM

Thanks! (Will take some time to work through ..)

Run queries in parallel

Re: Run queries in parallel

Re: Run queries in parallel

Re: Run queries in parallel

Re: Run queries in parallel

Re: Run queries in parallel

Re: Run queries in parallel

Re: Run queries in parallel

Re: Run queries in parallel

Re: Run queries in parallel

Re: Run queries in parallel

Re: Run queries in parallel

SAS Innovate 2026 Registration is Open