SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Run queries in parallel

Accepted Solution Solved
Reply
Super Contributor
Posts: 336
Accepted Solution

Run queries in parallel


Hello!

Is it possible to transform this sql statement to be carried out in parallel?

Proc SQL NoPrint;

Connect To ODBC (DSN=sassql);

Create Table Data.a1 As Select * From Connection To ODBC

(

Select KeyDate_YYYYMM, MonthDifference_Sales, Sum(OpenOrder_Value_BillingNotDue) As Value

From vFact_Orderbacklog_SAS

Where

KeyDate_YYYYMM>=200601

AND ScheduleLineCategory_Key not in ('XX', 'YY', 'ZZ')

AND Ultimo_Filter_Reporting =1

AND BusinessUnit_Key =8

Group By KeyDate_YYYYMM, MonthDifference_Sales

);

Create Table DAta.a2 As Select * From Connection To ODBC

(

Select KeyDate_YYYYMM, MonthDifference_Sales, Sum(OpenOrder_Qty_BillingNotDue_t) As Qty

From vFact_Orderbacklog_SAS

Where

KeyDate_YYYYMM>=200601

AND ScheduleLineCategory_Key not in ('XX', 'YY', 'ZZ')

AND Ultimo_Filter_Reporting =1

AND BusinessUnit_Key =8

Group By KeyDate_YYYYMM, MonthDifference_Sales

);

Disconnect From ODBC;

Quit;

Thx&kind regards


Accepted Solutions
Solution
‎04-13-2015 06:21 AM
Regular Contributor
Posts: 213

Re: Run queries in parallel

Running Parallel SAS jobs, where each job fires off a single query, requires having SAS/CONNECT licensed!

The MP Connect feature of SAS/CONNECT provides the functionality you are looking for.


Check out the following link for Sample template code http://support.sas.com/rnd/scalability/tricks/connect.html#smpmp

Hope this helps,

Ahmed

View solution in original post


All Replies
Super User
Posts: 3,101

Re: Run queries in parallel

Yes. For example you could for example run two or more queries at the same time with minor changes to your date selection. this would allow you to read a year or more data in each query. You could then combine the results of each query once they are complete.

Query 1:

Where

KeyDate_YYYYMM>=200601 and KeyDate_YYYYMM <200701

Query 2:

Where

KeyDate_YYYYMM>=200701 and KeyDate_YYYYMM <200801

Super Contributor
Posts: 336

Re: Run queries in parallel

You're right, this could lead to an improvement, but I was thinking of some kind of multi-threading. The trouble is, that my programs don't work out yet (and I would prefer to stick with proc sql when accessing the SQL-server).

Super User
Posts: 3,101

Re: Run queries in parallel

Multi-threading isn't really the issue here as this is not a CPU intensive process. It is IO and probably network bandwidth between SQL Server and SAS that is the bottleneck. I've had good success with the approach I've suggested. I can only suggest you try it for yourself and see what improvements you get. I've done as many as 8 or so parallel queries like this and it has usually turned out to be the network bandwidth being the worst bottleneck. If both servers are in the same server room then it will be very fast but if they are across town or across country.....in the worst scenario I've dealt with the SQL server was in a different country with a 2 or 3 thousand km of undersea fibre-optic cable in between! 

Super User
Super User
Posts: 7,392

Re: Run queries in parallel

Just to add, would executing some SQL on the server to create the final table, then selecting from that be quicker (i.e. processing all on DB, then extract results)?

proc sql;

     connect to oracle (...);

     /* Create a temp table with all the needed results */

     execute (create table TEMP$TEMP_TABLE as select ... from ...) by oracle;

     /* Now extract the results to SAS */

     create table MY_RESULTS as

     select * from connection to oracle (select * from TEMP$TEMP_TABLE);

     disconnect from oracle;

quit;

Super User
Posts: 3,101

Re: Run queries in parallel

Another thought - your two queries look identical apart from summing a different column in each. Why not combine them?

Super User
Posts: 6,928

Re: Run queries in parallel

Sometimes running network transfers in parallel may improve overall performance, because the network infrastructure may limit individual TCP/IP connections to a certain bandwidth. Running transfer processes in parallel may increase the overall network throughput (I often observe this with SSH/SFTP transfers).

But in your case I'd simply compute the sums in one select.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 336

Re: Run queries in parallel

You're right - all of you, the easy fix to make the queries faster is to put "Value" and "Qty" into the same select statement. However, please focus on how to run queries in parallel rather than why ;-) - it's only an example. I am pretty sure that SAS can do that, and I could like to check how much faster this approach would be - but I really could use some syntax help in this.

Solution
‎04-13-2015 06:21 AM
Regular Contributor
Posts: 213

Re: Run queries in parallel

Running Parallel SAS jobs, where each job fires off a single query, requires having SAS/CONNECT licensed!

The MP Connect feature of SAS/CONNECT provides the functionality you are looking for.


Check out the following link for Sample template code http://support.sas.com/rnd/scalability/tricks/connect.html#smpmp

Hope this helps,

Ahmed

Valued Guide
Posts: 3,208

Re: Run queries in parallel

Ok focussing on the how.
You are calculating statistic on some grouped level, the coding however is not explicit pass thru but classic SAS ANSI SQL not very well aligned to all DB specifics.
You are creating SAS tables out of those SQL versions each of them causing the load with the network.

To eliminate the forcing of using SAS-tables you could think of using tempdb and share connections. SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition

It will add a additional copy step from that tempdb to SAS but the processing by the RDBMS can be optimized for those running as much parallel in the RDBMS system.

Option 1

The goal getting it to execute it on the DBMS  is using execute SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition.

On some DBMSs, the DBMS-SQL-query argument can be a DBMS-specific SQL EXECUTE statement that executes a DBMS stored procedure.  However, if the stored procedure              contains more than one query, only the first query is processed.  SQL server is supporting multiple variables as an aggregate type:  Using Aggregate Functions in the Select List

Remember the process of getting all the data is running parallel on the side of the RDBMS!

Option 2

Going for the more advanced options. DS2 is supporting it all SAS(R) 9.4 DS2 Language Reference, Fourth Edition

It is a different design for accessing data connected to databases and a different way of programming.

All things can run parallel (threading) with FED-SQL SAS(R) 9.4 FedSQL Language Reference, Third Edition

 

Option 3

Stick with old coding habits and the limitations of proc sql. Run those SAS coding in parallel  having them coded not sequential but all as single sas sources.

Choose whatever you can do of: a/ mp-connect - b/ Eguide parallel code - c/ grid processing - d/ batch jobs

The limitations are not the technical ones, but understanding those and getting used to them 

---->-- ja karman --<-----
Super Contributor
Posts: 336

Re: Run queries in parallel

Thanks! (Will take some time to work through ..)

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 1609 views
  • 3 likes
  • 6 in conversation