BookmarkSubscribeRSS Feed
ZRick
Obsidian | Level 7

I have a simulation data set of over 200 million rows, I have to sort it utilizing proc sort and MP_Connect

Can someone help me with the code template?

4 REPLIES 4
Ksharp
Super User

So huge table ? If you have enough memory , I would choose HashTable. But it is usually not really.

How about spliting it into several small tables ?

Ksharp

DanielSantos
Barite | Level 11

The SORT procedure is multithreaded since v9, THREADS option should by set by default, same for CPUCOUNT option to the number of CPUs in your system. Unless your system has no support for multithreading or SAS is under v9, the SORT procedure is already pretty much enabled for parallelism.

More on SORT procedure here:

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473663.htm00   

Should you still need to use de MP Connect feature, here's what I use:

options autosignon=yes sascmd="!sascmd";

rsubmit process=thread1 wait=no;

/*** code here ***/

endrsubmit;

rsubmit process=thread2 wait=no;

/*** code here ***/

endrsubmit;

rsubmit process=thread3 wait=no;

/*** code here ***/

endrsubmit;

waitfor _all_ thread1 thread2 thread3;

options autosignon=no;

And a lot more here:

http://support.sas.com/rnd/scalability/tricks/index.html

Cheers from Portugal.

Daniel Santos @ www.cgd.pt

Patrick
Opal | Level 21

As Daniel says: Proc Sort is multithreaded.

I don't see how MP_Connect could help you. What would help (and I've seen this recently) is if your source data is stored using the SPDE engine as this will allow for multithreading. Even better would be to define the SPDE library using several disks as this would give you better I/O.

Using a hash:

With 200M rows only if you have a LOT of addressable memory. I've made a while ago a test on my Win7 8GB laptop how many keys I'm able to store in a hash and I remember that SAS on my laptop crashed with around 200M distinct values (numeric).

Not sure what you need to do but it might also be worth thinking about indexing your dataset instead of sorting. Again: Using the SPDE enging would be beneficial. I've seen cases where people indexed huge datasets (using multiple columns) and then were astonished that it didn't help a lot. What happened was that the size of the index got to something like 20% of the table - and with all the overhead of first reading the index and then the data in the table the overall performance didn't get much better (bottleneck was I/O).

So 200M rows stored in a standard SAS table will be a challenge in any ways. Make sure you've got it stored on the fastest disk available and if you can in any way influence the settings in your environment: Make sure that the work space and utilloc (where intermediary sort "slices" get stored) are not on the same disk (same controler). It's very very likely all about I/O for you.

One last thing: usin PROC SORT .... NOEQUALS allows SAS to use a more efficient sort algorithm (not sure if this is not already the default - but it doesn't hurt to set the option explicitly).

Patrick
Opal | Level 21

What do you mean by "partitioned data"? Where is the data stored (SAS or a database - and if database: Which one and which version and is the table partitioned and how?).

I found this link which is quite interesting and possibly what you have in mind: "Piping Between Data Step and Proc Sort on SMP Machine", http://support.sas.com/rnd/scalability/tricks/connect.html#pipds

Message was edited by: Patrick

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 920 views
  • 0 likes
  • 4 in conversation