BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Arun_shSAS
Fluorite | Level 6

Hi Experts, 

 

There is a question in my mind from few days and working on it. 

Suppose say I have an data set a which has a variable id and its not sorted.

data a;
input id;
cards;
1
2
4
5
3
;
run;

 I want this dataset to be sorted without using any proc or order by statement. I am thinking to transpose this data and then use call sort and then again transpose. Need your suggestions. 

 

 

Best Regards, 

Arun 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Option 1:

  Hash object with "ordered:" parameter:

 

data _null_;

  if 0 then set have;

  declare hash h (dataset:'have',ordered:'a');

    h.definekey('id');

    h.definedata(all:'Y');

    h.definedone();

  rc=h.output(dataset:'want');

run;

 

Option 2: You can even drop the ordered parameter if you use the hashexp:0 parameter:

 

data _null_;

  if 0 then set have;

  declare hash h (dataset:'have',hashexp:0);

    h.definekey('id');

    h.definedata(all:'Y');

    h.definedone();

  rc=h.output(dataset:'want2');

run;

 

 

The hashexp:0 parameter tells sas to use only a single "bucket" for the hash object. (2**0=1 bucket.  The default is hashexp:8 implying 2**8=256 buckets).  And each bucket, we have been told by Paul Dorfman in a number of presentations on the SAS hash object, is an AVL tree.  It turns out this AVL tree, when processed through the OUTPUT method generates a sorted dataset when only one bucket is used.

 

If you have duplicate id's, then include the "multidata:'Y'" parameter in the declare hash statement.

 

Why one would go out of their way to do this, I don't know.  Perhaps, because hash objects are memory-resident, there might be less disk input/output in re-ordering data items.

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20

How are you planning on transposing the data without using a procedure? 🙂

 

I take it that you are allowed to use IML since you mention call sort?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Why?  Proc sort, and order by are specifically built into SAS and SQL respectively for the sole purpose of ordering data.  What purpose does it serve doing this yourself, you will write code that will be messy, and far slower than the built in functions - plus to get an accurate procedure, you will need to validate it etc.

Arun_shSAS
Fluorite | Level 6
Hi,


Thanks for your reply. But sometimes these types fancy questions can be asked in candidature reviewing.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

If it was me in the chair I would say what I have stated above.  If the interviewier is asking daft stuff like this, then I would considering how to get out of there as fast as possible.  I mean where you going to go, maybe altering memory locations, heck go back to soldering transistors onto a board to order your data.

Kurt_Bremser
Super User

"If you're looking for an idiot who wastes your customer's money by wasting time doing stupid things, then I'm not the right person for you. I'm intelligent and like working hard at work worth doing."

Astounding
PROC Star

Well, if you can't use PROC SORT, it doesn't seem fair that you can use CALL SORTN.  So let's remove that tool as well.  Purely as a theoretical exercise of ocurse.

 

First, find out the number of IDs that you have:

 

data _null_;

set have nobs=_nobs_;

call symputx('n_ids', _nobs_);

run;

 

Then use a DATA step to transpose, sort (without any CALL routines), and output:

 

data want;

array ids {&n_ids} id1-id&n_ids;

do i=1 by 1 until (done);

   set have end=done;

   ids{i} = id;

end;

 

That much loads the IDs into an array.  Next, the same DATA step continues to sort them with no CALL routines:

 

do until (changes=0);

   changes=0;

   do i=2 to &n_ids;

      if ids{i} < ids{i-1} then do;

         * Two IDs are not in order.  Switch them;

         changes + 1;

         temp = ids{i-1};

         ids{i-1} = ids{i};

         ids{i}=temp;

      end;

   end;

end;

 

Yes, DO UNTIL is dangerous and if incorrectly used can result in an infinite loop.

 

When all the switching is complete, output the results in the same DATA step:

 

do i=1 to &n_ids;

   id = ids{i};

   output;

end;

keep id;

run;

 

Most anything can be done.  But as others have asked, would you really want to?

ballardw
Super User

Open a table in tableview with openmode=edit and sort there by clicking on column heading and the sort icon.

 

Write the data out to a different file format and use someone else's sort utility.

 

I could write a very inefficient program comparing two rows of data with lag or dif,retain and a bunch of arrays to swap/write values. Really would not want to.

 

I would really be tempted to ask an interviewer why they want to use a crowbar instead of a screwdriver when the task calls for a screwdriver.

mkeintz
PROC Star

Option 1:

  Hash object with "ordered:" parameter:

 

data _null_;

  if 0 then set have;

  declare hash h (dataset:'have',ordered:'a');

    h.definekey('id');

    h.definedata(all:'Y');

    h.definedone();

  rc=h.output(dataset:'want');

run;

 

Option 2: You can even drop the ordered parameter if you use the hashexp:0 parameter:

 

data _null_;

  if 0 then set have;

  declare hash h (dataset:'have',hashexp:0);

    h.definekey('id');

    h.definedata(all:'Y');

    h.definedone();

  rc=h.output(dataset:'want2');

run;

 

 

The hashexp:0 parameter tells sas to use only a single "bucket" for the hash object. (2**0=1 bucket.  The default is hashexp:8 implying 2**8=256 buckets).  And each bucket, we have been told by Paul Dorfman in a number of presentations on the SAS hash object, is an AVL tree.  It turns out this AVL tree, when processed through the OUTPUT method generates a sorted dataset when only one bucket is used.

 

If you have duplicate id's, then include the "multidata:'Y'" parameter in the declare hash statement.

 

Why one would go out of their way to do this, I don't know.  Perhaps, because hash objects are memory-resident, there might be less disk input/output in re-ordering data items.

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Arun_shSAS
Fluorite | Level 6

A very easy method is sort a dataset with proc sort or orderby satetment is indexing.

As we know the index in stored as sorted with manner along with the dataset.

 

 

data a;
input x;
cards;
1
3
4
2
5
;
run;

proc datasets lib=work  noprint;
modify a;
index create x;
run;

data b;
set a;
by x;
run;
Arun_shSAS
Fluorite | Level 6

hi do you have a pdf notes on this .. it will be great for me.

 

Thanks,

Arun

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 7639 views
  • 8 likes
  • 7 in conversation