PROC SORT is not good/efficient for BIG table.
Here is the best way to sort BIG table:
p.s. WANT table is what you are looking for.
%let dsn= sashelp.heart ;
%let var= Smoking_Status ;
/***********************/
%let dsid=%sysfunc(open(&dsn.));
%let vnum=%sysfunc(varnum(&dsid.,&var.));
%let vtype=%sysfunc(vartype(&dsid.,&vnum.));
%let dsid=%sysfunc(close(&dsid.));
proc sql;
create table _levels as
select distinct &var. as levels from &dsn.;
quit;
data _levels;
set _levels;
n+1;vtype="&vtype.";
run;
filename x temp;
data _null_;
file x;
put 'data ';
do until(last1);
set _levels end=last1;
put '_' n;
end;
put ";set &dsn.;select(&var.);";
do until(last2);
set _levels end=last2;
if upcase(vtype)='C' then put 'when("' levels +(-1) '") output _' n ';';
else put 'when(' levels ') output _' n ';';
end;
put 'otherwise;end;run; data want;set ';
do until(last3);
set _levels end=last3;
put '_' n;
end;
put ';run;';
stop;
run;
%include x;
Hi @pavank,
the Article "How to sort data in SAS" by Andy Rav (https://communities.sas.com/t5/SAS-Communities-Library/How-to-sort-data-in-SAS/ta-p/676170) which also recommends Chris Hemedingers blog (https://blogs.sas.com/content/sasdummy/2016/02/04/avoid-sorting-data-in-sas/) should be a good starting point for our question.
Best Regards
Markus
PROC SORT is not good/efficient for BIG table.
Here is the best way to sort BIG table:
p.s. WANT table is what you are looking for.
%let dsn= sashelp.heart ;
%let var= Smoking_Status ;
/***********************/
%let dsid=%sysfunc(open(&dsn.));
%let vnum=%sysfunc(varnum(&dsid.,&var.));
%let vtype=%sysfunc(vartype(&dsid.,&vnum.));
%let dsid=%sysfunc(close(&dsid.));
proc sql;
create table _levels as
select distinct &var. as levels from &dsn.;
quit;
data _levels;
set _levels;
n+1;vtype="&vtype.";
run;
filename x temp;
data _null_;
file x;
put 'data ';
do until(last1);
set _levels end=last1;
put '_' n;
end;
put ";set &dsn.;select(&var.);";
do until(last2);
set _levels end=last2;
if upcase(vtype)='C' then put 'when("' levels +(-1) '") output _' n ';';
else put 'when(' levels ') output _' n ';';
end;
put 'otherwise;end;run; data want;set ';
do until(last3);
set _levels end=last3;
put '_' n;
end;
put ';run;';
stop;
run;
%include x;
Some things to consider other than the size of the data set:
What kind of values are in the BY variables? Some of the options available in SORTSEQ, such as LINGUISTIC and/or NUMERIC_COLLATION may make Proc sort preferred over some simpler approaches because the desired result order may be attainable using them but not in other tools.
For example with the NUMERIC_COLLATION you can get text values that include characters to include the numeric value of the numerals and not the character coding only. This means for example that in sorting street addresses where you have a name and number that you get
11 Main St
45 Main St
123 Main St
145 Main St
1150 Main St
in that order instead of
11 Main st
1150 Main St
123 Main St
145 Main St
45 Main St
Hi @ballardw ,
Interesting system options for proc sort.
Are those options system specific? Couldn't find clear info in documentation (ref: https://documentation.sas.com/doc/en/proc/3.2/p02bhn81rn4u64n1b6l00ftdnxge.htm#p0o1lj0p3exoifn1rmqvm...).
When I submit the code below this option doesn't work and log has an error message.
data have;
infile cards dlm=',';
length name $20;
input name $;
cards;
11 Main St,
1150 Main St,
45 Main St,
123 Main St,
145 Main St
;
run;
options sortseq=LINGUISTIC(numeric_collation=on);
proc sort data=have out=foo1;
by name;
run;
442 options sortseq=LINGUISTIC(numeric_collation=on) ---------------------- 13 442! ; ERROR 13-12: Unrecognized SAS option name (NUMERIC_COLLATION=ON). 443 proc sort data=have out=foo1; 444 by name; 445 run;
I use SAS windowing environment, version: 9.04.01M5P091317
I will admit that there aren't good examples in the documentation. But the numericl collition option is for PROC SORT, not the OPTIONS statement AFAIK. The SORTSEQ option on the Options statement is basically for Language character set such as the sort order differences for "A" "a" "â" using a specific language. French might sort the "â" differently than Vietnamese for example.
proc sort data=have out=foo1 sortseq=LINGUISTIC(numeric_collation=on); by name; run;
Excellent, it's helpful knowing sortseq option of proc sort, thank you!
In my experience, PROC SORT handles MOST sorting requirements both effectively and efficiently and that is its biggest advantage. I suggest you try it for your use case and if it works efficiently then use it. Yes, there are times when huge datasets might require a different approach, but don't assume there are better sort methods for ALL datasets.
BTW, by keeping your data in sorted order you can often avoid a lot of sorting entirely.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.