Hi! I need to sort rather large arrays in the Data step. All Numeric values, unique. No. of values up to 50 thousand, perhaps more.
Is Quicksort the best?
Where can I find the best SAS code for Quicksort?
What alternatives are there for Quicksort?
I use SAS ODA (OnDemand for Academics). I want to write code that can be published and is very general.
Many thanks in advance!
(I have been googling for a white. Not easy to find the best answer)
/Br AndersS
Hi:
My tendency would be to use CALL SORTN, which is designed for sorting numeric array members. I did find a reference to "QUICKSORT" in this older user group paper by Paul Dorfman https://support.sas.com/resources/papers/proceedings/proceedings/sugi26/p096-26.pdf , however, I believe the paper may have pre-dated the introduction of CALL SORTN.
You may be limited by memory as to the size of the array, this previous forum thread discusses memory as a limiting factor in array size https://communities.sas.com/t5/SAS-Programming/what-s-the-limit-to-how-many-elements-variables-a-SAS... .
I'm sure that others with more experience sorting arrays will have additional feedback.
Cynthia
Fifty thousand variables in a single observation? Really?
Because otherwise an array makes no sense.
Hi! YES!
p.s.
The limit in SAS ODA is around 250 million values.
Hi:
My tendency would be to use CALL SORTN, which is designed for sorting numeric array members. I did find a reference to "QUICKSORT" in this older user group paper by Paul Dorfman https://support.sas.com/resources/papers/proceedings/proceedings/sugi26/p096-26.pdf , however, I believe the paper may have pre-dated the introduction of CALL SORTN.
You may be limited by memory as to the size of the array, this previous forum thread discusses memory as a limiting factor in array size https://communities.sas.com/t5/SAS-Programming/what-s-the-limit-to-how-many-elements-variables-a-SAS... .
I'm sure that others with more experience sorting arrays will have additional feedback.
Cynthia
Hi! I have made some tests on the Linux server for SAS ODA.
I used PROC NLIN – quadratic model.
CPU in seconds for CALL SORTN.
SIZE in millions of array elements.
(Good with all the examples in SAS Documentation. Just cut-and-paste)
Result: An almost straight line.
Sorting methods are often linear (like BigOrdo (N*log (N)) for "small models"
and more quadratic (like BigOrdo(N*N)) for "large models".
Transpose the data and use PROC SORT.
Like @Cynthia_sas, I would use the CALL SORTN subroutine. It works in a single data step, and sorting 50,000 numeric variables shouldn't add very much demand for memory.
And even if the quicksort algorithm were faster, you would need to include a lot more code (probably as a macro) than the single statement call sortn. Given that your objective is to generate code that "can be published and is very general", you might be better off using the simplest SAS code possible.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.