DATA Step, Macro, Functions and more

Sort Data without Proc sort or order by in proc sql

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

Sort Data without Proc sort or order by in proc sql

Hi Experts, 

 

There is a question in my mind from few days and working on it. 

Suppose say I have an data set a which has a variable id and its not sorted.

data a;
input id;
cards;
1
2
4
5
3
;
run;

 I want this dataset to be sorted without using any proc or order by statement. I am thinking to transpose this data and then use call sort and then again transpose. Need your suggestions. 

 

 

Best Regards, 

Arun 


Accepted Solutions
Solution
‎02-18-2017 11:28 AM
Valued Guide
Posts: 797

Re: Sort Data without Proc sort or order by in proc sql

Option 1:

  Hash object with "ordered:" parameter:

 

data _null_;

  if 0 then set have;

  declare hash h (dataset:'have',ordered:'a');

    h.definekey('id');

    h.definedata(all:'Y');

    h.definedone();

  rc=h.output(dataset:'want');

run;

 

Option 2: You can even drop the ordered parameter if you use the hashexp:0 parameter:

 

data _null_;

  if 0 then set have;

  declare hash h (dataset:'have',hashexp:0);

    h.definekey('id');

    h.definedata(all:'Y');

    h.definedone();

  rc=h.output(dataset:'want2');

run;

 

 

The hashexp:0 parameter tells sas to use only a single "bucket" for the hash object. (2**0=1 bucket.  The default is hashexp:8 implying 2**8=256 buckets).  And each bucket, we have been told by Paul Dorfman in a number of presentations on the SAS hash object, is an AVL tree.  It turns out this AVL tree, when processed through the OUTPUT method generates a sorted dataset when only one bucket is used.

 

If you have duplicate id's, then include the "multidata:'Y'" parameter in the declare hash statement.

 

Why one would go out of their way to do this, I don't know.  Perhaps, because hash objects are memory-resident, there might be less disk input/output in re-ordering data items.

 

 

 

 

View solution in original post


All Replies
PROC Star
Posts: 552

Re: Sort Data without Proc sort or order by in proc sql

How are you planning on transposing the data without using a procedure? Smiley Happy

 

I take it that you are allowed to use IML since you mention call sort?

Super User
Super User
Posts: 7,405

Re: Sort Data without Proc sort or order by in proc sql

Why?  Proc sort, and order by are specifically built into SAS and SQL respectively for the sole purpose of ordering data.  What purpose does it serve doing this yourself, you will write code that will be messy, and far slower than the built in functions - plus to get an accurate procedure, you will need to validate it etc.

Occasional Contributor
Posts: 17

Re: Sort Data without Proc sort or order by in proc sql

Hi,


Thanks for your reply. But sometimes these types fancy questions can be asked in candidature reviewing.
Super User
Super User
Posts: 7,405

Re: Sort Data without Proc sort or order by in proc sql

If it was me in the chair I would say what I have stated above.  If the interviewier is asking daft stuff like this, then I would considering how to get out of there as fast as possible.  I mean where you going to go, maybe altering memory locations, heck go back to soldering transistors onto a board to order your data.

Occasional Contributor
Posts: 17

Re: Sort Data without Proc sort or order by in proc sql

lol
Super User
Posts: 6,945

Re: Sort Data without Proc sort or order by in proc sql

"If you're looking for an idiot who wastes your customer's money by wasting time doing stupid things, then I'm not the right person for you. I'm intelligent and like working hard at work worth doing."

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,085

Re: Sort Data without Proc sort or order by in proc sql

Well, if you can't use PROC SORT, it doesn't seem fair that you can use CALL SORTN.  So let's remove that tool as well.  Purely as a theoretical exercise of ocurse.

 

First, find out the number of IDs that you have:

 

data _null_;

set have nobs=_nobs_;

call symputx('n_ids', _nobs_);

run;

 

Then use a DATA step to transpose, sort (without any CALL routines), and output:

 

data want;

array ids {&n_ids} id1-id&n_ids;

do i=1 by 1 until (done);

   set have end=done;

   ids{i} = id;

end;

 

That much loads the IDs into an array.  Next, the same DATA step continues to sort them with no CALL routines:

 

do until (changes=0);

   changes=0;

   do i=2 to &n_ids;

      if ids{i} < ids{i-1} then do;

         * Two IDs are not in order.  Switch them;

         changes + 1;

         temp = ids{i-1};

         ids{i-1} = ids{i};

         ids{i}=temp;

      end;

   end;

end;

 

Yes, DO UNTIL is dangerous and if incorrectly used can result in an infinite loop.

 

When all the switching is complete, output the results in the same DATA step:

 

do i=1 to &n_ids;

   id = ids{i};

   output;

end;

keep id;

run;

 

Most anything can be done.  But as others have asked, would you really want to?

Super User
Posts: 10,500

Re: Sort Data without Proc sort or order by in proc sql

Open a table in tableview with openmode=edit and sort there by clicking on column heading and the sort icon.

 

Write the data out to a different file format and use someone else's sort utility.

 

I could write a very inefficient program comparing two rows of data with lag or dif,retain and a bunch of arrays to swap/write values. Really would not want to.

 

I would really be tempted to ask an interviewer why they want to use a crowbar instead of a screwdriver when the task calls for a screwdriver.

Solution
‎02-18-2017 11:28 AM
Valued Guide
Posts: 797

Re: Sort Data without Proc sort or order by in proc sql

Option 1:

  Hash object with "ordered:" parameter:

 

data _null_;

  if 0 then set have;

  declare hash h (dataset:'have',ordered:'a');

    h.definekey('id');

    h.definedata(all:'Y');

    h.definedone();

  rc=h.output(dataset:'want');

run;

 

Option 2: You can even drop the ordered parameter if you use the hashexp:0 parameter:

 

data _null_;

  if 0 then set have;

  declare hash h (dataset:'have',hashexp:0);

    h.definekey('id');

    h.definedata(all:'Y');

    h.definedone();

  rc=h.output(dataset:'want2');

run;

 

 

The hashexp:0 parameter tells sas to use only a single "bucket" for the hash object. (2**0=1 bucket.  The default is hashexp:8 implying 2**8=256 buckets).  And each bucket, we have been told by Paul Dorfman in a number of presentations on the SAS hash object, is an AVL tree.  It turns out this AVL tree, when processed through the OUTPUT method generates a sorted dataset when only one bucket is used.

 

If you have duplicate id's, then include the "multidata:'Y'" parameter in the declare hash statement.

 

Why one would go out of their way to do this, I don't know.  Perhaps, because hash objects are memory-resident, there might be less disk input/output in re-ordering data items.

 

 

 

 

Occasional Contributor
Posts: 17

Re: Sort Data without Proc sort or order by in proc sql

A very easy method is sort a dataset with proc sort or orderby satetment is indexing.

As we know the index in stored as sorted with manner along with the dataset.

 

 

data a;
input x;
cards;
1
3
4
2
5
;
run;

proc datasets lib=work  noprint;
modify a;
index create x;
run;

data b;
set a;
by x;
run;
Occasional Contributor
Posts: 17

Re: Sort Data without Proc sort or order by in proc sql

hi do you have a pdf notes on this .. it will be great for me.

 

Thanks,

Arun

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 523 views
  • 8 likes
  • 7 in conversation