BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
user24feb
Barite | Level 11


Hello!

Is it possible to transform this sql statement to be carried out in parallel?

Proc SQL NoPrint;

Connect To ODBC (DSN=sassql);

Create Table Data.a1 As Select * From Connection To ODBC

(

Select KeyDate_YYYYMM, MonthDifference_Sales, Sum(OpenOrder_Value_BillingNotDue) As Value

From vFact_Orderbacklog_SAS

Where

KeyDate_YYYYMM>=200601

AND ScheduleLineCategory_Key not in ('XX', 'YY', 'ZZ')

AND Ultimo_Filter_Reporting =1

AND BusinessUnit_Key =8

Group By KeyDate_YYYYMM, MonthDifference_Sales

);

Create Table DAta.a2 As Select * From Connection To ODBC

(

Select KeyDate_YYYYMM, MonthDifference_Sales, Sum(OpenOrder_Qty_BillingNotDue_t) As Qty

From vFact_Orderbacklog_SAS

Where

KeyDate_YYYYMM>=200601

AND ScheduleLineCategory_Key not in ('XX', 'YY', 'ZZ')

AND Ultimo_Filter_Reporting =1

AND BusinessUnit_Key =8

Group By KeyDate_YYYYMM, MonthDifference_Sales

);

Disconnect From ODBC;

Quit;

Thx&kind regards

1 ACCEPTED SOLUTION

Accepted Solutions
AhmedAl_Attar
Rhodochrosite | Level 12

Running Parallel SAS jobs, where each job fires off a single query, requires having SAS/CONNECT licensed!

The MP Connect feature of SAS/CONNECT provides the functionality you are looking for.


Check out the following link for Sample template code http://support.sas.com/rnd/scalability/tricks/connect.html#smpmp

Hope this helps,

Ahmed

View solution in original post

10 REPLIES 10
SASKiwi
PROC Star

Yes. For example you could for example run two or more queries at the same time with minor changes to your date selection. this would allow you to read a year or more data in each query. You could then combine the results of each query once they are complete.

Query 1:

Where

KeyDate_YYYYMM>=200601 and KeyDate_YYYYMM <200701

Query 2:

Where

KeyDate_YYYYMM>=200701 and KeyDate_YYYYMM <200801

user24feb
Barite | Level 11

You're right, this could lead to an improvement, but I was thinking of some kind of multi-threading. The trouble is, that my programs don't work out yet (and I would prefer to stick with proc sql when accessing the SQL-server).

SASKiwi
PROC Star

Multi-threading isn't really the issue here as this is not a CPU intensive process. It is IO and probably network bandwidth between SQL Server and SAS that is the bottleneck. I've had good success with the approach I've suggested. I can only suggest you try it for yourself and see what improvements you get. I've done as many as 8 or so parallel queries like this and it has usually turned out to be the network bandwidth being the worst bottleneck. If both servers are in the same server room then it will be very fast but if they are across town or across country.....in the worst scenario I've dealt with the SQL server was in a different country with a 2 or 3 thousand km of undersea fibre-optic cable in between! 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Just to add, would executing some SQL on the server to create the final table, then selecting from that be quicker (i.e. processing all on DB, then extract results)?

proc sql;

     connect to oracle (...);

     /* Create a temp table with all the needed results */

     execute (create table TEMP$TEMP_TABLE as select ... from ...) by oracle;

     /* Now extract the results to SAS */

     create table MY_RESULTS as

     select * from connection to oracle (select * from TEMP$TEMP_TABLE);

     disconnect from oracle;

quit;

SASKiwi
PROC Star

Another thought - your two queries look identical apart from summing a different column in each. Why not combine them?

Kurt_Bremser
Super User

Sometimes running network transfers in parallel may improve overall performance, because the network infrastructure may limit individual TCP/IP connections to a certain bandwidth. Running transfer processes in parallel may increase the overall network throughput (I often observe this with SSH/SFTP transfers).

But in your case I'd simply compute the sums in one select.

user24feb
Barite | Level 11

You're right - all of you, the easy fix to make the queries faster is to put "Value" and "Qty" into the same select statement. However, please focus on how to run queries in parallel rather than why 😉 - it's only an example. I am pretty sure that SAS can do that, and I could like to check how much faster this approach would be - but I really could use some syntax help in this.

AhmedAl_Attar
Rhodochrosite | Level 12

Running Parallel SAS jobs, where each job fires off a single query, requires having SAS/CONNECT licensed!

The MP Connect feature of SAS/CONNECT provides the functionality you are looking for.


Check out the following link for Sample template code http://support.sas.com/rnd/scalability/tricks/connect.html#smpmp

Hope this helps,

Ahmed

jakarman
Barite | Level 11

Ok focussing on the how.
You are calculating statistic on some grouped level, the coding however is not explicit pass thru but classic SAS ANSI SQL not very well aligned to all DB specifics.
You are creating SAS tables out of those SQL versions each of them causing the load with the network.

To eliminate the forcing of using SAS-tables you could think of using tempdb and share connections. SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition

It will add a additional copy step from that tempdb to SAS but the processing by the RDBMS can be optimized for those running as much parallel in the RDBMS system.

Option 1

The goal getting it to execute it on the DBMS  is using execute SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition.

On some DBMSs, the DBMS-SQL-query argument can be a DBMS-specific SQL EXECUTE statement that executes a DBMS stored procedure.  However, if the stored procedure              contains more than one query, only the first query is processed.  SQL server is supporting multiple variables as an aggregate type:  Using Aggregate Functions in the Select List

Remember the process of getting all the data is running parallel on the side of the RDBMS!

Option 2

Going for the more advanced options. DS2 is supporting it all SAS(R) 9.4 DS2 Language Reference, Fourth Edition

It is a different design for accessing data connected to databases and a different way of programming.

All things can run parallel (threading) with FED-SQL SAS(R) 9.4 FedSQL Language Reference, Third Edition

 

Option 3

Stick with old coding habits and the limitations of proc sql. Run those SAS coding in parallel  having them coded not sequential but all as single sas sources.

Choose whatever you can do of: a/ mp-connect - b/ Eguide parallel code - c/ grid processing - d/ batch jobs

The limitations are not the technical ones, but understanding those and getting used to them 

---->-- ja karman --<-----
user24feb
Barite | Level 11

Thanks! (Will take some time to work through ..)

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 6589 views
  • 3 likes
  • 6 in conversation