BookmarkSubscribeRSS Feed
SSN_Ravi
Calcite | Level 5
Hi all,

How to sort dataset without using SORT procedure.

Thanks in advance...

--------
SSN Ravi
10 REPLIES 10
LawrenceHW
Quartz | Level 8
You can use a PROC SQL (using the ORDER BY) as an alternative to PROC SORT.

Why would you not want to use PROC SORT?
deleted_user
Not applicable
I agree Lawrence, the Sort procedure produces a table with records ordered in a specific sequence. The order by statement in SQL does the same thing, but the downsides are that the table is replaced by the ordered version, and that it will take a piece of time and system resources to produce the result.

I like the WW2 underground posters that say "is your journey really necessary?" they remind me to check that what I am doing is not just a waste of time and energy, and reordering data is one of those cases.

Sometimes, what is needed is data that can be read in a specific order, and the addition of one or more indexes preserves the original order of the data but makes the data accessible in a different sequence for those procedures and steps where a different vector is required into the data. It costs the time to generate the indexes, and the space to store them, but in some cases (especially with large tables with a number of logical query vectors) it is a reasonable exchange.

Kind regards

David
deleted_user
Not applicable
I agree Lawrence, the Sort procedure produces a table with records ordered in a specific sequence. The order by statement in SQL does the same thing, but the downsides are that the table is replaced by the ordered version, and that it will take a piece of time and system resources to produce the result.

I like the WW2 underground posters that say "is your journey really necessary?" they remind me to check that what I am doing is not just a waste of time and energy, and reordering data is one of those cases.

Sometimes, what is needed is data that can be read in a specific order, and the addition of one or more indexes preserves the original order of the data but makes the data accessible in a different sequence for those procedures and steps where a different vector is required into the data. It costs the time to generate the indexes, and the space to store them, but in some cases (especially with large tables with a number of logical query vectors) it is a reasonable exchange.

Kind regards

David


[Reposted: spell.jspa script seemed to hang during posting of message.]
LawrenceHW
Quartz | Level 8
Dave,

Completely agree with your philosophy regarding unecessary sorts. However, in clinical trials (especially working in a CRO), indexes (indices?), are not really worth the hassle of setting them up. The datasets are not, in general, that large and we're not updating data sets. We're overwriting them (i.e. getting new dumps of data) so we'd have to re-create the index every time we got a new batch of data.

In general terms, the efficiencies required for the clinical trials environment are to be gained via maintenance of code (i.e. commenting, planning your programs, making them easy to read, documentation, etc.). However, avoiding unnecessary sorts (using SORT or SQL) is a desirable thing as it means you've thought about the program layout and design.

Cheers,
Lawrence
deleted_user
Not applicable
Clearly it's been a long time since i was involved in clin trials data Lawrence. I thought one of the issues facing you was proving that the data were unchanged between a report and a review, and the date time stamps on the table were an element of that proof. If you sort the table, it is replaced and the date time is altered, although there are some new SAS options to preserve existing date time values between steps.

I recognise the tables are small, but I am also concerned with the silliness that happens when someone misplaces a semicolon and kills the contents of a data set. Easy mistake to make, hard one to fix if you don't have backups.

Kind regards

David
LawrenceHW
Quartz | Level 8
Dave,

In theory, yes that is what we want to show. However, in practice, it doesn't work out like that. We need to manipulate our raw data into analysis data sets. These analysis data sets are then interrogated and summary tables, listings and figures produced from them.

Until we get the magic version control/audit trail piece of software which can automatically track input and output from a SAS program including datasets, log files and output files (including proc printtos, ODS destinations, etc.), we'll never have that level of "proof".

Before anyone at SAS mentions Drug Development, it's not the answer (at least none of the demos and detailed talks I've seen and had with the developers have shown this). It's a good attempt but there's a long way to go to make it the version control/audit trail of our wishes.

Lawrence
1162
Calcite | Level 5
Maybe this is getting off topic, but wouldn't it be better to create a copy of the master data set and sort that "working" copy? I haven't analyzed many clinical trial datasets, but for the few that I have done, I set up the original SAS datasets in a read-only library and make copies when I have to manipulate data.

Also, some procedures don't require sorting. You could use the CLASS statement instead of the BY statement for some PROCs. For reporting results, I've used an approach where I determine order using a numeric column (you really have to know what the rows will be) and then order the data as part of the DEFINE statement in a PROC REPORT procedure.
LawrenceHW
Quartz | Level 8
True some procedures don't require the data to be sorted before using. But SAS is still going to have to go through the sorting process somewhere ....
deleted_user
Not applicable
Generally I have no quarrel with that statement Lawrence, I suspect that the procedure option UnSorted may cause a sort to be performed at some point. And indeed checking "behind the scenes" of the SQL Procedure one can find SQXSort was called because SAS decided it was necessary.

However, for the Means / Summary twins, the use of a Class statement causes a table to be built with a row for each level, which is updated with counts as source rows are identified. New class types are inserted as required at the appropriate type / lexicographical level, so no sort is performed. Since this occurs in memory, this can be a burden on the available system resources, which is why "Sort and By" is the standard cure for failures of the Class statement. At least that is as I learned it.

Kind regards

David
deleted_user
Not applicable
would it be acceptable to view the tables in the SAS System Viewer? That allows sorting of the view, without updating content.

Alternatively, discover how easy it can be to create indexes, and open a view of the data with the IDXWHERE=YES option. From there it would not be difficult to load the view through the index ~~> showing the content in the order of that index. Multiple orders require only multiple indexes. I have done this with extra icons over the viewtable view ~ one icon for each "order".

PeterC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

New Learning Events in April

 

Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.

LEARN MORE

Discussion stats
  • 10 replies
  • 1692 views
  • 0 likes
  • 4 in conversation