Hello All,
I've encountered with below question in one of the interview, Could anyone please help me to nail this down.
"What is the preferred way / Approach to SORT the dataset which having the size of 500GB residing in Total Disk Size is 600GB ?".
Regards,
Jai
It's a trick question. If you have a total disk size of 600GB, and one data set occupies 500GB of that space, there isn't enough space available to sort the data.
They must be looking for the TAGSORT option, which sorts only the keys.
TAGSORT can't be use, As we have only 600GB of Space with 500GB of data, TAGSORT would also consume Disk, However less space than normal sort, though it is not efficient option to use when we have only 100 GB disk spare for 500 GB Table.
We have to figure out other options.
Regards,
Jai
Another option is the sortdev system option (on Windows, at least) to tell the system to put the temporary sort files elsewhere.
It is also a pretty dumb question. If the intention is to find out whether you know about TAGSORTing when it is rarely if ever used in practice, it doesn't really tell you anything about the competence of the candidate.
But even if tagsort is the object of this so-called test question, it won't work. There has to be a moment during which both the original and the sorted copy of the entire data set are on the disk before SAS is satisfied to tell the op system to delete the original.
It's not a trick question. It's not even a question. It's a fantasy - this could only work if one could reduce the size of the original while building the sorted copy. Not going to happen, certainly not in SAS.
I guess you could INDEX the data set by the sort variable(s). Then you could use a BY statement to retrieve in sorted order. But that's not the same as sorting the data set.
I'm looking for the best approach to nail this down, Can you suggest any other approach.
Regards,
Jai
First, clarify the question. If you have 600 GB of free space available, TAGSORT is a good answer. If you only have 100 GB of free space available, nothing will work. Also note, last I checked, some operating systems did not support TAGSORT. So validate whether it is available. If it is not, you can still write the equivalent of TAGSORT using a SAS program.
@jaiganesh wrote:I'm looking for the best approach to nail this down, Can you suggest any other approach.
Regards,
Jai
Is this an interview question or a real world question? Because the answers could be different.
Best approach? Add more disk space to support your requirements. Probably won't work in an interview, perhaps not in real world either. But if your hardware doesn't support your data requirements...
Other approaches:
1) Create an index on your sort columns. Depending on the number of columns and their width, the index may fit in the 100GB. You should be able to do a rough calculation by (bytes per key columns) * (number of rows). If you Google, you should find the exact formula for calculating the index size.
2) Once that is done, if the dataset must be physically sorted, allocate a libref with at least 500GB space (preferably more), use a data step + BY statement + Index to write the dataset to the libref, delete the existing dataset (hope you have a backup), and write the dataset back.
Start the process on Friday afternoon, and hopefully it's finished by Monday morning.
If this really is an interview question, don't just run with this. Hit the docs on SAS indexes, how they're created, how they're used, etc.
Edit: Oops, @mkeintz already suggested this - I must read previous answers more closely rather than just a quick skim! 😉
The following code could give you a hint . (scan table once and combine them together) .
Make a macro for your big table.
data F M;
set sashelp.class;
if sex='F' then output F;
else if sex='M' then output M;
run;
data want;
set F M; /*notice the order of table name*/
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.