Help using Base SAS procedures

Efficiency of sorting by character variables vs. numeric variables ?

Reply
Contributor
Posts: 24

Efficiency of sorting by character variables vs. numeric variables ?

Hi ,
Is there a major difference in sorting by character variables vs. numeric variables?

Scenario.
1. Dataset having three numeric variables key1, key2 , key3
using "by key1 key2 key3"

2. Dataset having a concatenated variables of the three keys— big_key (key1-key2-key3) e.g. "23-34-56";
then using "by big_key";

Is there a performance difference ?
(Dataset contains around 1mil. records. )

Thanks
Frequent Contributor
Posts: 82

Re: Efficiency of sorting by character variables vs. numeric variables ?

Not sure, if it is created just for this sort, it sounds like an unnecessary variable that just makes your big data set even bigger.
Super User
Posts: 9,673

Re: Efficiency of sorting by character variables vs. numeric variables ?

I prefer to use 2 because it is faster.
Opps,I should point out that these two way is different.
Maybe you need to make some code to see how different they are.

Ksharp Message was edited by: Ksharp
Contributor
Posts: 24

Re: Efficiency of sorting by character variables vs. numeric variables ?

ok. i shall re-phrase my question

i have a huge dataset with three key variables.
key1 key2 key3.
I need to sort by these three variables( mentioning all in the by statement)

whats the fastest way to sort.
1. using three numeric variables in the by statement
2. concatenate three variables and create a character variable. Then sorting by the character variable?
Super Contributor
Super Contributor
Posts: 3,174

Re: Efficiency of sorting by character variables vs. numeric variables ?

Honestly, why don't you simply try a self-initiated experiment and report the results back to the forum.

Take a subset of your "huge" input data file and do a few sorts with different techniques.

Add OPTIONS like FULLSTIMER and also consider SORT options like EQUALS/NOEQUALS for performance consideration.

Scott Barry
SBBWorks, Inc.
Contributor
Posts: 24

Re: Efficiency of sorting by character variables vs. numeric variables ?

actually i tried. And couldn't find any significant performance difference.
But there should be some theoretical evidence using which we can conclude whats the optimal.

Thanks Message was edited by: kansas
Super User
Posts: 10,483

Re: Efficiency of sorting by character variables vs. numeric variables ?

The TAGSORT option might be something to investigate for improving performance of sorts on large sets.
Super User
Posts: 9,673

Re: Efficiency of sorting by character variables vs. numeric variables ?

I will choose first.
Ask a Question
Discussion stats
  • 7 replies
  • 125 views
  • 0 likes
  • 5 in conversation