BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
pavank
Quartz | Level 8
What are disadvantages of Proc sort precedure give some points which is best method for sorting data and delete duplicates data even if huge data and small data
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

PROC SORT is not good/efficient for BIG table.

Here is the best  way to sort BIG table:

p.s. WANT table is what you are looking for.

 

%let dsn= sashelp.heart ;
%let var= Smoking_Status ;






/***********************/
%let dsid=%sysfunc(open(&dsn.));
%let vnum=%sysfunc(varnum(&dsid.,&var.));
%let vtype=%sysfunc(vartype(&dsid.,&vnum.));
%let dsid=%sysfunc(close(&dsid.));
proc sql;
create table _levels as
select distinct &var. as levels from &dsn.;
quit;
data _levels;
 set _levels;
 n+1;vtype="&vtype.";
run;
filename x temp;
data _null_;
file x;
put 'data ';
do until(last1);
 set _levels end=last1;
 put '_' n;
end;
put ";set &dsn.;select(&var.);";
do until(last2);
 set _levels end=last2;
 if upcase(vtype)='C' then put 'when("' levels +(-1) '") output _' n ';';
  else put 'when(' levels ') output _' n ';';
end;
put 'otherwise;end;run; data want;set ';
do until(last3);
 set _levels end=last3;
 put '_' n;
end;
put ';run;';
stop;
run;
%include x;

View solution in original post

7 REPLIES 7
MarkusWeick
Barite | Level 11

Hi @pavank

 

the Article "How to sort data in SAS" by Andy Rav (https://communities.sas.com/t5/SAS-Communities-Library/How-to-sort-data-in-SAS/ta-p/676170) which also recommends Chris Hemedingers blog (https://blogs.sas.com/content/sasdummy/2016/02/04/avoid-sorting-data-in-sas/) should be a good starting point for our question.

 

Best Regards

Markus

Please help to keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
Ksharp
Super User

PROC SORT is not good/efficient for BIG table.

Here is the best  way to sort BIG table:

p.s. WANT table is what you are looking for.

 

%let dsn= sashelp.heart ;
%let var= Smoking_Status ;






/***********************/
%let dsid=%sysfunc(open(&dsn.));
%let vnum=%sysfunc(varnum(&dsid.,&var.));
%let vtype=%sysfunc(vartype(&dsid.,&vnum.));
%let dsid=%sysfunc(close(&dsid.));
proc sql;
create table _levels as
select distinct &var. as levels from &dsn.;
quit;
data _levels;
 set _levels;
 n+1;vtype="&vtype.";
run;
filename x temp;
data _null_;
file x;
put 'data ';
do until(last1);
 set _levels end=last1;
 put '_' n;
end;
put ";set &dsn.;select(&var.);";
do until(last2);
 set _levels end=last2;
 if upcase(vtype)='C' then put 'when("' levels +(-1) '") output _' n ';';
  else put 'when(' levels ') output _' n ';';
end;
put 'otherwise;end;run; data want;set ';
do until(last3);
 set _levels end=last3;
 put '_' n;
end;
put ';run;';
stop;
run;
%include x;
ballardw
Super User

Some things to consider other than the size of the data set:

What kind of values are in the BY variables? Some of  the options available in SORTSEQ, such as LINGUISTIC and/or NUMERIC_COLLATION may make Proc sort preferred over some simpler approaches because the desired result order may be attainable using them but not in other tools.

 

For example with the NUMERIC_COLLATION you can get text values that include characters to include the numeric value of the numerals and not the character coding only. This means for example that in sorting street addresses where you have a name and number that you get

11  Main St

45 Main St

123 Main St

145 Main St

1150 Main St

in that order instead of

11 Main st

1150 Main St

123 Main St

145 Main St

45 Main St

A_Kh
Barite | Level 11

Hi @ballardw ,
Interesting system options for proc sort. 

Are those options system specific?  Couldn't find clear info in documentation (ref: https://documentation.sas.com/doc/en/proc/3.2/p02bhn81rn4u64n1b6l00ftdnxge.htm#p0o1lj0p3exoifn1rmqvm...). 

When I submit the code below this option doesn't work and log has an error message.  

data have; 
infile cards dlm=','; 
length name $20; 
input name $;
cards; 
11  Main St,
1150 Main St,
45 Main St,
123 Main St,
145 Main St
;
run; 

options sortseq=LINGUISTIC(numeric_collation=on);
proc sort data=have out=foo1;                                          
by name;                                                                        
run;      
442  options sortseq=LINGUISTIC(numeric_collation=on)
                                                    ----------------------
                                                    13
442! ;
ERROR 13-12: Unrecognized SAS option name (NUMERIC_COLLATION=ON).
443  proc sort data=have out=foo1;
444  by name;
445  run;

I use SAS windowing environment,  version: 9.04.01M5P091317


ballardw
Super User

I will admit that there aren't good examples in the documentation. But the numericl collition option is for PROC SORT, not the OPTIONS statement AFAIK. The SORTSEQ option on the Options statement is basically for Language character set such as the sort order differences for "A" "a" "â" using a specific language. French might sort the "â" differently than Vietnamese for example.

 

proc sort data=have out=foo1 sortseq=LINGUISTIC(numeric_collation=on);                                         
by name;                                                                        
run;
A_Kh
Barite | Level 11

Excellent, it's helpful knowing sortseq option of proc sort, thank you!

SASKiwi
PROC Star

In my experience, PROC SORT handles MOST sorting requirements both effectively and efficiently and that is its biggest advantage. I suggest you try it for your use case and if it works efficiently then use it. Yes, there are times when huge datasets might require a different approach, but don't assume there are better sort methods for ALL datasets.

 

BTW, by keeping your data in sorted order you can often avoid a lot of sorting entirely.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 7356 views
  • 7 likes
  • 6 in conversation