turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- SORT vs. SORTNDX

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-24-2015 09:57 AM

I need to sort, in IML, a matrix with a very large number (less than 500,000, but still hundreds of thousands) of rows. Am I right that the SORTNDX call is the same as SORT, but for very large matrices? If so, how large is large enough that SORTNDX rather than SORT should be used? Is there any "cut-off" figure or even a guideline for when to call SORT and when to call SORTNDX?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-24-2015 11:01 AM

How many columns in your matrix?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-24-2015 11:22 AM

There will be three or four columns (I am not yet sure).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-25-2015 08:56 AM

For only 500,000 rows, it doesn't matter much unless you intend to repeat the sorting many times. IML will sort 500,000 rows in less than a second.

To answer your question, the SORTNDX function is not "the same as SORT, but for very large matrices." It has a different purpose, namely to give you a vector that indicates the sorting order of the rows. This sorting vector is sometimes called the "anti-rank."

To compute the sorting vector with SORTNDX, the function has to sort the matrix while keeping track of where each row of the original matrix appears in the final sorted matrix. Thus in general the SORTNDX function has to do more work and would be slower.

I think the answer to your question depends on what you intend to do with the matrix. If you just want the sorted values, use SORT. If you want to do something like count the number of unique rows in the matrix, then the SORTNDX function in conjunction with the UNIQUEBY function might be worth using.