I need to sort, in IML, a matrix with a very large number (less than 500,000, but still hundreds of thousands) of rows. Am I right that the SORTNDX call is the same as SORT, but for very large matrices? If so, how large is large enough that SORTNDX rather than SORT should be used? Is there any "cut-off" figure or even a guideline for when to call SORT and when to call SORTNDX?
How many columns in your matrix?
There will be three or four columns (I am not yet sure).
For only 500,000 rows, it doesn't matter much unless you intend to repeat the sorting many times. IML will sort 500,000 rows in less than a second.
To answer your question, the SORTNDX function is not "the same as SORT, but for very large matrices." It has a different purpose, namely to give you a vector that indicates the sorting order of the rows. This sorting vector is sometimes called the "anti-rank."
To compute the sorting vector with SORTNDX, the function has to sort the matrix while keeping track of where each row of the original matrix appears in the final sorted matrix. Thus in general the SORTNDX function has to do more work and would be slower.
I think the answer to your question depends on what you intend to do with the matrix. If you just want the sorted values, use SORT. If you want to do something like count the number of unique rows in the matrix, then the SORTNDX function in conjunction with the UNIQUEBY function might be worth using.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.