BookmarkSubscribeRSS Feed
neha_sas
Calcite | Level 5

HI,

I am trying to sort a very large dataset( aprrox 32G) on 4 variables. I tried using the Tagsort option but got the error meessage specifying

ERROR: TAGSORT option cannot be used with engines that do not support random access.

Any suggestions as to how can this error be avoided? Or how can I sort the dataset more efficiently.

Thanks,

Neha

6 REPLIES 6
Astounding
PROC Star

Why would you not have random access?  Is your data stored on tape (or on disk, but in tape format)?

If you can run this program, there is probably a solution:

data _null_;

   _n_=50;

   set have point=_n_;

   stop;

run;

SAS Press publishes a book on efficiency (author is Virgile).  The chapter on sorting contains a small section on how to program your own TAGSORT, but it requires point= as part of the solution.

A secondary question is why you need TAGSORT.  Maybe this should be the primary question.  Is the data set so large that you can't get the sort work space?  Are you hoping that TAGSORT will get the program to run faster?

Good luck.

neha_sas
Calcite | Level 5

Thanks for the reply but the dataset is indeed really large ( in the order of 30 gigs).

and the main objective behind using Tagsort is to save the workspace.

Astounding
PROC Star

If work space is the limiting factor, it will help to have other disk space you can use.  Split up the data.  For illustration, assume you have 90M observations:

proc sort data=have (firstobs=1 obs=10000000) out=out._1_;

   by four variables;

run;

proc sort data=have (firstobs=10000001 obs=20000000) out=out._2_;

   by four variables;

run;

...

proc sort data=have (firstobs=80000001) out=out._9_;

   by four variables;

run;

data want;

   set out._1_ out._2_ ... out._9_;

   by four variables;

run;

You may even find that this runs faster.  The SAS sorting mechanism has a small component that is proportional to the square of the number of observations.

If it is technically sound for your application (and if your operating system supports it), the NOEQUALS option may speed things up as well.

Good luck.

Doc_Duke
Rhodochrosite | Level 12

The point functionality that Astounding references also requires random access.  The TAGSORT will generally save space in performing the sort (it is not necessarily faster), but you will need to copy the data to a standard SAS engine first (I suspect you have a transport or EXPORT engine format dataset).

Doc Muhlbaier

Duke

neha_sas
Calcite | Level 5

Hey Duke,

I checked the engine format for my dataset and its V9Tape. How can I copy it to a standard V9 format?

Thanks.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 4487 views
  • 4 likes
  • 4 in conversation