BookmarkSubscribeRSS Feed
jaiganesh
Obsidian | Level 7

Hello All,

 

 

I've encountered with below question in one of the interview, Could anyone please help me to nail this down.

 

 

"What is the preferred way / Approach to SORT the dataset which having the size of 500GB residing in Total Disk Size is 600GB ?".

 

 

Regards,

Jai

10 REPLIES 10
Astounding
PROC Star

It's a trick question.  If you have a total disk size of 600GB, and one data set occupies 500GB of that space, there isn't enough space available to sort the data.

PGStats
Opal | Level 21

They must be looking for the TAGSORT option, which sorts only the keys.

PG
jaiganesh
Obsidian | Level 7

TAGSORT can't be use, As we have only 600GB of Space with 500GB of data, TAGSORT would also consume Disk, However less space than normal sort, though it is not efficient option to use when we have only 100 GB disk spare for 500 GB Table.

 

We have to figure out other options.

 

 

Regards,

Jai

PGStats
Opal | Level 21

Another option is the sortdev system option (on Windows, at least) to tell the system to put the temporary sort files elsewhere.

PG
SASKiwi
PROC Star

It is also a pretty dumb question. If the intention is to find out whether you know about TAGSORTing when it is rarely if ever used in practice, it doesn't really tell you anything about the competence of the candidate.

mkeintz
PROC Star

But even if tagsort is the object of this so-called test question, it won't work.   There has to be a moment during which both the original and the sorted copy of the entire data set are on the disk before SAS is satisfied to tell the op system to delete the original.

 

It's not a trick question.  It's not even a question.  It's a fantasy - this could only work if one  could reduce the size of the original while building the sorted copy.  Not going to happen, certainly not in SAS.

 

 

I guess you could INDEX the data set by the sort variable(s).  Then you could use a BY statement to retrieve in sorted order.  But that's not the same as sorting the data set.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jaiganesh
Obsidian | Level 7

I'm looking for the best approach to nail this down, Can you suggest any other approach.

 

 

 

Regards,

Jai

 

Astounding
PROC Star

First, clarify the question.  If you have 600 GB of free space available, TAGSORT is a good answer.  If you only have 100 GB of free space available, nothing will work.  Also note, last I checked, some operating systems did not support TAGSORT.  So validate whether it is available.  If it is not, you can still write the equivalent of TAGSORT using a SAS program.

ScottBass
Rhodochrosite | Level 12

@jaiganesh wrote:

I'm looking for the best approach to nail this down, Can you suggest any other approach.

 

 

 

Regards,

Jai

 


 

Is this an interview question or a real world question?  Because the answers could be different.

 

Best approach?  Add more disk space to support your requirements.  Probably won't work in an interview, perhaps not in real world either.  But if your hardware doesn't support your data requirements...

 

Other approaches:

 

1) Create an index on your sort columns.  Depending on the number of columns and their width, the index may fit in the 100GB.  You should be able to do a rough calculation by (bytes per key columns) * (number of rows).  If you Google, you should find the exact formula for calculating the index size.

 

2) Once that is done, if the dataset must be physically sorted, allocate a libref with at least 500GB space (preferably more), use a data step + BY statement + Index to write the dataset to the libref, delete the existing dataset (hope you have a backup), and write the dataset back.

 

Start the process on Friday afternoon, and hopefully it's finished by Monday morning.

 

If this really is an interview question, don't just run with this.  Hit the docs on SAS indexes, how they're created, how they're used, etc.

 

Edit:  Oops, @mkeintz already suggested this - I must read previous answers more closely rather than just a quick skim! 😉

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Ksharp
Super User

The following code could give you a hint . (scan table once and combine them together) .

Make a macro for your big table.

 

data F M;

 set sashelp.class;

if sex='F' then output F;

 else if sex='M' then output M;

run;

 

data want;

 set F M;  /*notice the order of table name*/

run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 896 views
  • 3 likes
  • 7 in conversation