BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Lets say I have a 4.2 DI job which is as follows --

SAS extract(40 million) --> SQL JOin with A --> Tempobject1 --> Lookup with B --> Tempobject2 --> SCD Loader --> Tempobject3 --> Table Loader --> ORA target

I'm running this job on a server which has 4 nodes(CPUs)

My question is is it possible in DI studio to run this job on all 4 CPUS simultaneosly. By this I mean the SAS extarct of 40 million will be split into say 10 million each and each CPU will read 10 million, do necessary lookups/sql joins etc and finally load in ORA , and that too in parallel.

This is a common feature in ETL tools like Datastage/Informatica, just wondering if DI studio has anything like this.

Appreciate any inputs on this!
1 REPLY 1
LinusH
Tourmaline | Level 20
Well, this takes some explanation. First one to have to understand what SAS can do in parallel and how, and then how that affects your DI studio design.

I/O: SAS can do multithreaded reads if the data source supports that. This includes SAS engines SPDE and SPDS, and multiple external RDMS via SAS/ACCESS (including Oracle). This will occur "automatically", and is maintained by system, libname and data set options.

Processing: on a row level, SAS has a couple of procedures (including part of SQL) that run in parallel. If your source data to a step reside in Oracle, SAS can push down the SQL, and then the RDBMS is responsible for the parallelism.

On a higher level, when you have different chunks of data, the can be processed concurrently. This is done on application level using either MP CONNECT or Grid Manager (which in turn uses Platform LSF). Which to use of these depends on your license and your server infrastructure. To use this kind of parallelism you have to build your ETL flow accordingly. Take a look at DI Studio doc for guidance.

Conclusion: your flow is probably already partly parallelized. To make it use parallel steps you have look at your options setting, Oracle load settings. If the need urgent, you may have to rebuild the to split your data to able to make it run in parallel, but be "careful", not all type of processes will benefit from this.

/Linus
Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 536 views
  • 0 likes
  • 2 in conversation