11-24-2015 09:57 AM
I need to sort, in IML, a matrix with a very large number (less than 500,000, but still hundreds of thousands) of rows. Am I right that the SORTNDX call is the same as SORT, but for very large matrices? If so, how large is large enough that SORTNDX rather than SORT should be used? Is there any "cut-off" figure or even a guideline for when to call SORT and when to call SORTNDX?
11-25-2015 08:56 AM
For only 500,000 rows, it doesn't matter much unless you intend to repeat the sorting many times. IML will sort 500,000 rows in less than a second.
To answer your question, the SORTNDX function is not "the same as SORT, but for very large matrices." It has a different purpose, namely to give you a vector that indicates the sorting order of the rows. This sorting vector is sometimes called the "anti-rank."
To compute the sorting vector with SORTNDX, the function has to sort the matrix while keeping track of where each row of the original matrix appears in the final sorted matrix. Thus in general the SORTNDX function has to do more work and would be slower.
I think the answer to your question depends on what you intend to do with the matrix. If you just want the sorted values, use SORT. If you want to do something like count the number of unique rows in the matrix, then the SORTNDX function in conjunction with the UNIQUEBY function might be worth using.