Patrick's example code does not modify the source data set source.bbbbb It READS from that data set to create subsets for sorting. His code then puts the sorted subsets back together NOT overwriting the source data set. Also, see this paper on different sort methods with SAS: https://www.lexjansen.com/nesug/nesug13/29_Final_Paper.pdf Optimal (in terms of time and/or disk usage) sorting depends on how much memory you have available and your disk I/O. Since you seem to be short on storage space, consider using Indexes. See Example 3 in the linked PDF. You'll sacrifice running time for reducing storage space. Since you aren't modifying the LIVE data set directly, you'll need to make a copy of the data set and create the INDEX on that copy. If you can read the data set into memory (100+ gig memory) then you'll have really fast performance after copying the data to memory. Futhermore, there are variations you can try. Assuming your keys are unique, you could use hash techniques to sort subsets of your data in memory and then combine the data back together[I think you need to be careful in how you'd split up your data set into pieces for creating indexes. You'd want to just be able to stack the sorted data produced from each subset.] This requires time to copy data to memory (and enough memory) but if disk storage or disk i/o is a big concern then it might give you better performance. I think using a hash object (or objects) will do better than INDEXES if you can't fit your entire data set into memory.
... View more