BookmarkSubscribeRSS Feed
dvtarasov
Obsidian | Level 7

I need to sort, in IML, a matrix with a very large number (less than 500,000, but still hundreds of thousands) of rows. Am I right that the SORTNDX call is the same as SORT, but for very large matrices? If so, how large is large enough that SORTNDX rather than SORT should be used? Is there any "cut-off" figure or even a guideline for when to call SORT and when to call SORTNDX?

3 REPLIES 3
Rick_SAS
SAS Super FREQ

How many columns in your matrix?

dvtarasov
Obsidian | Level 7

There will be three or four columns (I am not yet sure).

Rick_SAS
SAS Super FREQ

For only 500,000 rows, it doesn't matter much unless you intend to repeat the sorting many times. IML will sort 500,000 rows in less than a second.

 

To answer your question, the SORTNDX function is not "the same as SORT, but for very large matrices." It has a different purpose, namely to give you a vector that indicates the sorting order of the rows. This sorting vector is sometimes called the "anti-rank."

 

To compute the sorting vector with SORTNDX, the function has to sort the matrix while keeping track of where each row of the original matrix appears in the final sorted matrix. Thus in general the SORTNDX function has to do more work and would be slower.

 

I think the answer to your question depends on what you intend to do with the matrix.  If you just want the sorted values, use SORT. If you want to do something like count the number of unique rows in the matrix, then the SORTNDX function in conjunction with the UNIQUEBY function might be worth using.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1614 views
  • 0 likes
  • 2 in conversation