DATA Step, Macro, Functions and more

sorting technique in datastep

Accepted Solution Solved
Reply
Contributor
Posts: 63
Accepted Solution

sorting technique in datastep

Hi,

How can we sort a data set with out using proc sort ..

 

for example:

 

id   name

2     aaa

1     bbb

5     ccc

4     ddd

3     eee

 

output:

 

1   bbb

2   aaa

3   eee

4   ddd

5    ccc

 

 


Accepted Solutions
Solution
‎04-23-2017 03:14 PM
PROC Star
Posts: 7,356

Re: sorting technique in datastep

The following takes six times longer than using proc sort, but includes some interesting concepts well worth while learning. Any file, array, or set of arrays can be sorted in either ascending or descending order. @SAShole's macro can do either. It wasn't designed to sort files but, with a slight modification, can easily do that as well. I use it when I need to sort an array but, simultaneously, change the order of a 2nd, 3rd or whatever array. Unfortunately, call sortc and call sortn can't do that!:

 

data have (drop=i);
  input id $ name $ other_stuff $;
  datalines;
b2     aaa xxx
b1     bbb yyy
b5     ccc zzz
b4     ddd aaa
b3     eee bbb
;
run;

/*==================================================================+
 | Name:    Qsort.                                                   |
 |===================================================================|
 | Developer: Paul M. Dorfman.                                       |
 |===================================================================|
 | Function: Sort array in place via the QuickSort algorithm.        |
 |===================================================================|
 | General:                                                          |
 |-------------------------------------------------------------------|
 | Qsort is a macro call routine designed to be invoked from within  |
 | a DATA step to sort a single-subscripted array or a parallel list |
 | of N arrays using one array as a key. The array(s) can be arrays  |
 | SAS variables or a temporary array; numeric or character. Arrays  |
 | in the list do not have to be the same type and/or item length.   |
 |                                                                   |
 | The kernel of the routine is a DATA step implementation of the    |
 | QuickSort algorithm (C.A.R. Hoare, 1960). Final ordering is done  |
 | by a single sweep of the modified insertion sort after quicksort  |
 | has reduced all array subpartitions to no more than &M items.     |
 |-------------------------------------------------------------------|
 | Performance:                                                      |
 |-------------------------------------------------------------------|
 | Run-time increases as N*log2(N), where N is the number of array   |
 | elements to be sorted. As a benchmark, a random numeric temporary |
 | array is sorted into ascending order in lt 1 CPU sec under OS/390 |
 | IBM model 9672 R36 running SAS version 8.0.                       |
 |===================================================================|
 | Usage examples:                                                   |
 |===================================================================|
 | 1. Sorting entire array A ascending (semicolon is optional):      |
 |                                                                   |
 |    %Qsort (Arr=A, Seq=a);                                         |
 |                                                                   |
 |    or simply, using Seq=a default,                                |
 |                                                                   |
 |    %Qsort (Arr=A);                                                |
 |-------------------------------------------------------------------|
 | 2. Sorting entire array B descending:                             |
 |                                                                   |
 |    %Qsort (Arr=B, Seq=d);                                         |
 |-------------------------------------------------------------------|
 | 3. Sorting elements -12345 through 12345 of array Z ascending,    |
 |    leaving the rest of the items intact;                          |
 |                                                                   |
 |    %Qsort (Arr=Z, first=-12345, last=12345);                      |
 |-------------------------------------------------------------------|
 | 4. Sorting first 100 elements of array C descending, the rest -   |
 |    ascending:                                                     |
 |                                                                   |
 |    %Qsort (Arr=C, Seq=d, last= 100);                              |
 |    %Qsort (Arr=C, Seq=a, first=101);                              |
 |-------------------------------------------------------------------|
 | 5. Sorting first half of array D ascending, second half -         |
 |    descending:                                                    |
 |                                                                   |
 |    half = int((lbound(D)+hbound(D))*.5);                          |
 |    %Qsort (Arr=D, Seq=D, last=half   );                           |
 |    %Qsort (Arr=D, Seq=A, first=half+1);                           |
 |                                                                   |
 |    or without HALF and omitting Seq=A by default:                 |
 |                                                                   |
 |    %Qsort (Arr=D, Seq=D, last= (lbound(D)+hbound(D))*.5  );       |
 |    %Qsort (Arr=D,        first=(lbound(D)+hbound(D))*.5+1);       |
 |-------------------------------------------------------------------|
 | 6. Doing the same as in 5 without the auxiliary variable HALF     |
 |    and omitting Seq=A by default:                                 |
 |                                                                   |
 |    %Qsort (Arr=D, Seq=D, last= (lbound(D)+hbound(D))*.5  );       |
 |    %Qsort (Arr=D,        first=(lbound(D)+hbound(D))*.5+1);       |
 |-------------------------------------------------------------------|
 | 7. Sorting parallel arrays A B C using array B as the key array:  |
 |                                                                   |
 |    %Qsort (Arr=A B C, By=C) ;                                     |
 |                                                                   |
 |    Illustration. The following step uses array C as a key:        |
 |    ------------                                                   |
 |    data _null_;                                                   |
 |       array a(10)    ( 9   8   7   6   5   4   3   2   1   0 );   |
 |       array b(10) $  ('a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j');   |
 |       array c(10)    ( 0   1   2   3   4   5   6   7   8   9 );   |
 |      %Qsort (Arr=A B C, By=C, Seq=D);                             |
 |       put A(*) / B(*) / C(*);                                     |
 |    run;                                                           |
 |                                                                   |
 |    It prints the following in the log:                            |
 |                                                                   |
 |    0 1 2 3 4 5 6 7 8 9                                            |
 |    j i h g f e d c b a                                            |
 |    9 8 7 6 5 4 3 2 1 0                                            |
 |                                                                   |
 |    Note: QuickSort is an unstable algorithm, i.e. if the key array|
 |    ----------------------------------------                       |
 |    has ties (duplicate keys), subordinate array elements DO NOT   |
 |                               ---------------------------------   |
 |    retain their original relative order. This behavior is similar |
 |    ------------------------------------                           |
 |    to that of Proc Sort with NOEQUALS option.                     |
 |    The parallel array feature is most useful when it is necessary |
 |    to perform and indirect sort, i.e. with a single parallel array|
 |    containing pointers to long 'records'. For instance, instead of|
 |    giving Qsort the labor of ordering 20 parallel arrays, it is   |
 |    more efficient to store the pointers of the arrays in a single |
 |    parallel array, and then finish the entire thing using a sweep |
 |    of indirect sorting.                                           |
 |                                                                   |
 |    Note: When using LB= and HB= options with parallel arrays, ALL |
 |    ----------------------------------------                       |
 |    lower array bounds must be GE LB=value. Likewise, the upper    |
 |    bounds must all be LE than HB= value. As long as the domain of |
 |    ALL indices lies within an LB= and HB, the indexing of arrays  |
 |    on the list can be arbitrary (in particular, negative).        |
 |===================================================================|
 | Arguments:                                                        |
 |-------------------------------------------------------------------|
 | Parameter  Usage     Description                                  |
 |-------------------------------------------------------------------|
 | Arr=       Required  The name of the array (or list of any number |
 |                      of parallel arrays) to be sorted. By default,|
 |                      the first array in the list becomes the key  |
 |                      array, and the rest of the arrays are permut-|
 |                      ed accordingly.                              |
 |-------------------------------------------------------------------|
 | By=        Optional  The name of the key array in the list. If the|
 |                      list consists of a single array, it is the   |
 |                      key array itself - along with the rule above.|
 |                      See usage for more details.                  |
 |-------------------------------------------------------------------|
 | Seq=       Optional  Sorting sequence. Ascending is default. To   |
 |                      specify it explicitly, set Seq=A (any case). |
 |                      Anything else will result in the array sorted|
 |                      decsending.                                  |
 |-------------------------------------------------------------------|
 | LB=        Optional  The indices of the first and last array      |
 | HB=        Numeric   elements to be included into sorting. You    |
 |                      can specify any valid numeric SAS expression,|
 |                      hardcoded numeric value, or a macrovariable  |
 |                      reference resolving to any of the above. The |
 |                      values of these parms default to the lower   |
 |                      and upper bounds.                            |
 |                      Use LB= and HB= parameters if you need only  |
 |                      part of the array to be ordered, or if you   |
 |                      want different parts ordered differently,    |
 |                      which can be  achieved by issuing two or more|
 |                      consecutive calls to Qsort with LB= and HB=  |
 |                      specified accordingly (see Usage above).     |
 |-------------------------------------------------------------------|
 | M=         Optional  Tuning parm: The largest subpartition size   |
 |            Numeric   Quicksort attempts to partition further. Any |
 |                      subpartition LE &M is passed to the straight |
 |                      insertion sort. &M=1 corresponds to 'pure'   |
 |                      Quicksort working until all subpartitions    |
 |                      have been reduced to just 1 element. &M=9 is |
 |                      optimal. Variations from 5 to 15 affect the  |
 |                      sorting speed very slightly. Best advice as  |
 |                      to M= : Leave it alone at 9.                 |
 +===================================================================*/
 
  %Local _ H I J L N P Q S T W ;

  %Macro Swap (I,J) ;
    %Local W ;
     Do ;
       %Do W = 1 %To &N ;
         &T&W      = &&A&W(&I) ;
         &&A&W(&I) = &&A&W(&J) ;
         &&A&W(&J) =  &T&W     ;
       %End ;
    End ;
  %Mend Swap ;

  %If %Upcase(&Seq) = %Upcase(A) %Then %Let Q = G ;
  %Else                                %Let Q = L ;

  %Do %Until (&&A&N EQ ) ;
    %Let N  = %Eval(&N + 1) ;
    %Local A&N ;
    %Let A&N = %Scan(&Arr,&N,%Str( )) ;
  %End ;
  %Let N = %Eval(&N - 1) ;

  %Let _ = %Substr(%Sysfunc(Ranuni(0)),3,
  %Eval(7 - %Length(&N) + 5*(%Substr(&Sysver,1,1) GT 6))) ;

  %Let H = H&_ ; %Let I = I&_ ; %Let J = J&_ ; %Let L = L&_ ;
  %Let P = P&_ ; %Let S = S&_ ; %Let T = T&_ ; %Let Z = Z&_ ;

  Array &Z (0:1, 0:50) _Temporary_ ;

  &L = &LB ; &H = &HB ;

  If &H - &L GT &M Then Do &S=1 By 0 While (&S) ;
    &J = (&H - &L)/3 ; &I = &L + &J ; &J = &I + &J ;
    If &By(&L) &Q.T &By(&I) Then %Swap(&L,&I) ;
    If &By(&I) &Q.T &By(&J) Then %Swap(&I,&J) ;
    If &By(&J) &Q.T &By(&H) Then %Swap(&J,&H) ;
    If &By(&L) &Q.T &By(&I) Then %Swap(&L,&I) ;
    If &By(&I) &Q.T &By(&J) Then %Swap(&I,&J) ;
    If &By(&L) &Q.T &By(&I) Then %Swap(&L,&I) ;

    %If &M LE 3 %Then %Do ;
      If &H - &L LE 3 Then Do ;
        &L = &Z(0,&S) ; &H = &Z(1,&S) ; &S +- 1 ;
        Continue ;
      End ;
    %End ;

    %Swap(&L,&I) ; &P = &By(&L) ; &I = &L ;

    Do &J=&H + 1 By 0 ;
      Do &I=&I + 1 By  + 1 Until(&By(&I) &Q.E &P) ; End ;
      Do &J=&J - 1 By  - 1 Until(&P &Q.E &By(&J)) ; End ;
      If &I GE &J Then Leave ;
      %Swap(&I,&J) ;
    End ;

    %Swap(&L,&J) ;

    If      &H - &J GE &J - &L GT &M Then Do &S = &S + 1 ;
      &Z(0,&S) = &J + 1 ; &Z(1,&S) = &H ; &H = &J - 1 ;
    End ;
    Else If &J - &L GE &H - &J GT &M Then Do &S = &S + 1 ;
      &Z(0,&S) = &L ; &Z(1,&S) = &J - 1 ; &L = &J + 1 ;
    End ;
    Else If &J - &L GT &M GE &H - &J Then &H = &J - 1 ;
    Else If &H - &J GT &M GE &J - &L Then &L = &J + 1 ;
    Else Do ;
      &L = &Z(0,&S) ; &H = &Z(1,&S) ; &S +- 1 ;
    End ;
  End ;

  %If &M = 1 %Then %Goto Exit ;

  Do &J = &LB + 1 To &HB ;
    If &By(&J - 1) &Q.T &By(&J) Then Do ;
      &P = &By(&J) ;
      %Do W = 1 %To &N ;
        %If &&A&W Ne &By %Then &T&W = &&A&W(&J)  ;  ;
      %End ;
      Do &I = &J - 1 To &LB By  - 1 ;
        If &P &Q.E &By(&I) Then Leave ;
        %Do W = 1 %To &N ;
          &&A&W(&I + 1) = &&A&W(&I) ;
        %End ;
      End ;
      &By(&I + 1) = &P ;
      %Do W = 1 %To &N ;
        %If &&A&W Ne &By %Then &&A&W(&I + 1) = &T&W ; ;
      %End ;
    End ;
  End ;

  %Exit: Drop &H &I &J &L &P &S T&_: ;

%Mend Qsort ;

/*
data have;
 input v1-v11;
 cards4;
 0.004159 0.033589 0 0.277991 0.277991 0.024312 0 0.004479 0 0.050864 0.002239
 0.047345 0 0.098209 0.119962 0.164107 0.004159 0 0.004159 0.002239 0.004798 0.028791
 0 0.071337 0.12476 0.040627 0.089891 0.016955 0 0.042226 0 0 0.022713
 0.016635 0.147473 0.06238 0.051184 0.022393 0.206654 0.002559 0.004479 0.038708 0.121561 0
 0.277351 0 0.158989 0 0 0 0.056942 0.21849 0 0.011836 0.005758
 0.038708 0 0.119642 0 0.012476 0.039987 0.004479 0.00064 0.150992 0.117722 0.050544
 0.00032 0.050224 0 0 0 0.117083 0.135956 0.130838 0.241523 0.03199 0.022073
 0.00128 0.018874 0.029111 0.023992 0.088612 0 0.105886 0.045425 0.109085 0 0
 0.115483 0 0.024312 0.004159 0 0.089571 0.021433 0 0.024952 0.18682 0.026871
 0.003519 0 0 0.00096 0.34293 0.129559 0 0 0 0 0.003839
;;;;
run;

data want (keep=v1-v4 name1-name4);
  set have;
  array A(11) v1-v11;
  array N(11) $ name1-name11;
  do i=1 to dim(A);
    N(i)=vname(A(i));
  end;
  %Qsort (Arr=A N, By=A, Seq=D);
run;
*/

data _null_;
  call symput('nobs',nobs);
  stop;
  set have nobs=nobs;
run;

data want (drop=i);
  array point_var(&nobs.) _temporary_;
  array id_var(&nobs.) $ _temporary_;
  i=0;
  do until (eof);
    set have end = eof nobs=n;
    i+1;
    id_var(i)=id;
    point_var(i)=i;
  end;
  %Qsort (Arr=point_var id_var, By=id_var, Seq=A);
  do i=1 to n;
    x=point_var(i);
    set have point=x;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Super User
Posts: 6,928

Re: sorting technique in datastep

Use proc sql, but technically it does the same.

Or you could use a hash object in a data step, but that is much more complicated than typing 3 lines of proc sort.

Proc sort is the best, performancewise. And it scales fine until you run out of WORK (or UTILLOC) space.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
PROC Star
Posts: 547

Re: sorting technique in datastep

I recommend proc sort, but if this is out of the question, you can use proc sql

 

data have;
input id name $;
datalines;
2 aaa
1 bbb
5 ccc
4 ddd
3 eee
;

proc sql;
   create table want as
   select * 
   from have
   order by id;
quit;
Contributor
Posts: 63

Re: sorting technique in datastep

I need to sort the data in datastep without using procedures..

Contributor
Posts: 63

Re: sorting technique in datastep

I need to sort without using procedures
PROC Star
Posts: 547

Re: sorting technique in datastep

[ Edited ]

This will do the trick, though proc sort is probably still a more efficient and simpler choice

 

data _null_;
   if 0 then set have(keep=id name);
   declare hash sorthave (dataset:'have', ordered:'Y');
      sorthave.definekey ('id');
      sorthave.definedata ('id','name');
      sorthave.definedone ();
   sorthave.output(dataset:'want');
stop;
run;

  

Regular Contributor
Posts: 178

Re: sorting technique in datastep

Here's a paper showing how to do a bubble sort http://support.sas.com/resources/papers/proceedings14/2025-2014.pdf but I can't help but be curious as to why you can't use ANY Procs......

Super User
Posts: 10,466

Re: sorting technique in datastep

I have to say that one of the things I liked about SAS when I first started was not having to write my sort routines from scratch as I had in FORTRAN. Someone would have to point to a very overriding concern to avoid proc sort. Or perhaps you are asking how to use an external (i.e. non-SAS) host sort utility program?

Super User
Posts: 6,928

Re: sorting technique in datastep


molla wrote:
I need to sort without using procedures

This is one of the most stupid things I ever read on the internet. Have the person that gave you this requirement visit a doctor. And don't smoke the same stuff, it seems to be contaminated.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 3,887

Re: sorting technique in datastep

@molla "I need to sort without using procedures"

 

Why?

Contributor
Posts: 63

Re: sorting technique in datastep

I was asked in a interview to sort in data step

Respected Advisor
Posts: 3,887

Re: sorting technique in datastep

Then I hope that your interviewer was after the hash solution @draycut proposed or after some answer which just demonstrates your understanding of the data step as such.

Super User
Posts: 9,662

Re: sorting technique in datastep

Are you sure your boss want a data analysis guy NOT a computer guy ?



data have;
input id   name $;
cards;
2     aaa
1     bbb
5     ccc
4     ddd
3     eee
;
run;
data want;
 set have end=last;
 array x{9999} _temporary_;
 array y{9999} $ 40 _temporary_;
 x{_n_}=id;
 y{_n_}=name;
 if last then do;
  do i=1 to _n_-1;
   do j=i+1 to _n_;
    if x{i} gt x{j} then do;
      t1=x{i};
	  x{i}=x{j};
	  x{j}=t1;

	  t2=y{i};
	  y{i}=y{j};
	  y{j}=t2;
     end;
   end;
  end;

  do k=1 to _n_;
   id=x{k};
   name=y{k};
   output;
  end;
 end;
 keep id name;
run;
Super Contributor
Posts: 254

Re: sorting technique in datastep

Here is sortless Sorting using Key Indexing.

 

[1] If id is a number and that maximum if id is 5 then the following will work:

data have;
input id name $;
datalines;
2     aaa
1     bbb
5     ccc
4     ddd
3     eee
;
run;

data want;
   array k[5] $8 _temporary_;
   if _n_ = 1 then do until(eof);
      set have end = eof;
      k[id] = name;
   end;
   do i = 1 to dim(k);
      id = i;
      name = k[i];
      output;
   end;
stop;
drop i;
run;

proc print data = want;
run;

If ID is a digital string the following can be done while creating HAVE using:

data have(drop = id rename = (iid = id));
input id $ name $;
iid = input(id, 8.);
datalines;
2     aaa
1     bbb
5     ccc
4     ddd
3     eee
;
run;

If the data set, HAVE is very large, the minimum and maximum of ID can't be found, then use the following program to determine them

and the use the program that follows:

data _null_;
   min = constant('BIG');
   max = 0;
   do until(eof);
      set have end = eof;
      if id < min then min = id;
      if id > max then max = id;
   end;
   call symputx('min', min);
   call symputx('max', max);
stop;
run;

%put &min;
%put &max;

data want;
   array k[&min:&max] $8 _temporary_;
   if _n_ = 1 then do until(eof);
      set have end = eof;
      k[id] = name;
   end;
   do i = lbound(k) to hbound(k);
      id = i;
      name = k[i];
      output;
   end;
stop;
drop i;
run;

 

Solution
‎04-23-2017 03:14 PM
PROC Star
Posts: 7,356

Re: sorting technique in datastep

The following takes six times longer than using proc sort, but includes some interesting concepts well worth while learning. Any file, array, or set of arrays can be sorted in either ascending or descending order. @SAShole's macro can do either. It wasn't designed to sort files but, with a slight modification, can easily do that as well. I use it when I need to sort an array but, simultaneously, change the order of a 2nd, 3rd or whatever array. Unfortunately, call sortc and call sortn can't do that!:

 

data have (drop=i);
  input id $ name $ other_stuff $;
  datalines;
b2     aaa xxx
b1     bbb yyy
b5     ccc zzz
b4     ddd aaa
b3     eee bbb
;
run;

/*==================================================================+
 | Name:    Qsort.                                                   |
 |===================================================================|
 | Developer: Paul M. Dorfman.                                       |
 |===================================================================|
 | Function: Sort array in place via the QuickSort algorithm.        |
 |===================================================================|
 | General:                                                          |
 |-------------------------------------------------------------------|
 | Qsort is a macro call routine designed to be invoked from within  |
 | a DATA step to sort a single-subscripted array or a parallel list |
 | of N arrays using one array as a key. The array(s) can be arrays  |
 | SAS variables or a temporary array; numeric or character. Arrays  |
 | in the list do not have to be the same type and/or item length.   |
 |                                                                   |
 | The kernel of the routine is a DATA step implementation of the    |
 | QuickSort algorithm (C.A.R. Hoare, 1960). Final ordering is done  |
 | by a single sweep of the modified insertion sort after quicksort  |
 | has reduced all array subpartitions to no more than &M items.     |
 |-------------------------------------------------------------------|
 | Performance:                                                      |
 |-------------------------------------------------------------------|
 | Run-time increases as N*log2(N), where N is the number of array   |
 | elements to be sorted. As a benchmark, a random numeric temporary |
 | array is sorted into ascending order in lt 1 CPU sec under OS/390 |
 | IBM model 9672 R36 running SAS version 8.0.                       |
 |===================================================================|
 | Usage examples:                                                   |
 |===================================================================|
 | 1. Sorting entire array A ascending (semicolon is optional):      |
 |                                                                   |
 |    %Qsort (Arr=A, Seq=a);                                         |
 |                                                                   |
 |    or simply, using Seq=a default,                                |
 |                                                                   |
 |    %Qsort (Arr=A);                                                |
 |-------------------------------------------------------------------|
 | 2. Sorting entire array B descending:                             |
 |                                                                   |
 |    %Qsort (Arr=B, Seq=d);                                         |
 |-------------------------------------------------------------------|
 | 3. Sorting elements -12345 through 12345 of array Z ascending,    |
 |    leaving the rest of the items intact;                          |
 |                                                                   |
 |    %Qsort (Arr=Z, first=-12345, last=12345);                      |
 |-------------------------------------------------------------------|
 | 4. Sorting first 100 elements of array C descending, the rest -   |
 |    ascending:                                                     |
 |                                                                   |
 |    %Qsort (Arr=C, Seq=d, last= 100);                              |
 |    %Qsort (Arr=C, Seq=a, first=101);                              |
 |-------------------------------------------------------------------|
 | 5. Sorting first half of array D ascending, second half -         |
 |    descending:                                                    |
 |                                                                   |
 |    half = int((lbound(D)+hbound(D))*.5);                          |
 |    %Qsort (Arr=D, Seq=D, last=half   );                           |
 |    %Qsort (Arr=D, Seq=A, first=half+1);                           |
 |                                                                   |
 |    or without HALF and omitting Seq=A by default:                 |
 |                                                                   |
 |    %Qsort (Arr=D, Seq=D, last= (lbound(D)+hbound(D))*.5  );       |
 |    %Qsort (Arr=D,        first=(lbound(D)+hbound(D))*.5+1);       |
 |-------------------------------------------------------------------|
 | 6. Doing the same as in 5 without the auxiliary variable HALF     |
 |    and omitting Seq=A by default:                                 |
 |                                                                   |
 |    %Qsort (Arr=D, Seq=D, last= (lbound(D)+hbound(D))*.5  );       |
 |    %Qsort (Arr=D,        first=(lbound(D)+hbound(D))*.5+1);       |
 |-------------------------------------------------------------------|
 | 7. Sorting parallel arrays A B C using array B as the key array:  |
 |                                                                   |
 |    %Qsort (Arr=A B C, By=C) ;                                     |
 |                                                                   |
 |    Illustration. The following step uses array C as a key:        |
 |    ------------                                                   |
 |    data _null_;                                                   |
 |       array a(10)    ( 9   8   7   6   5   4   3   2   1   0 );   |
 |       array b(10) $  ('a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j');   |
 |       array c(10)    ( 0   1   2   3   4   5   6   7   8   9 );   |
 |      %Qsort (Arr=A B C, By=C, Seq=D);                             |
 |       put A(*) / B(*) / C(*);                                     |
 |    run;                                                           |
 |                                                                   |
 |    It prints the following in the log:                            |
 |                                                                   |
 |    0 1 2 3 4 5 6 7 8 9                                            |
 |    j i h g f e d c b a                                            |
 |    9 8 7 6 5 4 3 2 1 0                                            |
 |                                                                   |
 |    Note: QuickSort is an unstable algorithm, i.e. if the key array|
 |    ----------------------------------------                       |
 |    has ties (duplicate keys), subordinate array elements DO NOT   |
 |                               ---------------------------------   |
 |    retain their original relative order. This behavior is similar |
 |    ------------------------------------                           |
 |    to that of Proc Sort with NOEQUALS option.                     |
 |    The parallel array feature is most useful when it is necessary |
 |    to perform and indirect sort, i.e. with a single parallel array|
 |    containing pointers to long 'records'. For instance, instead of|
 |    giving Qsort the labor of ordering 20 parallel arrays, it is   |
 |    more efficient to store the pointers of the arrays in a single |
 |    parallel array, and then finish the entire thing using a sweep |
 |    of indirect sorting.                                           |
 |                                                                   |
 |    Note: When using LB= and HB= options with parallel arrays, ALL |
 |    ----------------------------------------                       |
 |    lower array bounds must be GE LB=value. Likewise, the upper    |
 |    bounds must all be LE than HB= value. As long as the domain of |
 |    ALL indices lies within an LB= and HB, the indexing of arrays  |
 |    on the list can be arbitrary (in particular, negative).        |
 |===================================================================|
 | Arguments:                                                        |
 |-------------------------------------------------------------------|
 | Parameter  Usage     Description                                  |
 |-------------------------------------------------------------------|
 | Arr=       Required  The name of the array (or list of any number |
 |                      of parallel arrays) to be sorted. By default,|
 |                      the first array in the list becomes the key  |
 |                      array, and the rest of the arrays are permut-|
 |                      ed accordingly.                              |
 |-------------------------------------------------------------------|
 | By=        Optional  The name of the key array in the list. If the|
 |                      list consists of a single array, it is the   |
 |                      key array itself - along with the rule above.|
 |                      See usage for more details.                  |
 |-------------------------------------------------------------------|
 | Seq=       Optional  Sorting sequence. Ascending is default. To   |
 |                      specify it explicitly, set Seq=A (any case). |
 |                      Anything else will result in the array sorted|
 |                      decsending.                                  |
 |-------------------------------------------------------------------|
 | LB=        Optional  The indices of the first and last array      |
 | HB=        Numeric   elements to be included into sorting. You    |
 |                      can specify any valid numeric SAS expression,|
 |                      hardcoded numeric value, or a macrovariable  |
 |                      reference resolving to any of the above. The |
 |                      values of these parms default to the lower   |
 |                      and upper bounds.                            |
 |                      Use LB= and HB= parameters if you need only  |
 |                      part of the array to be ordered, or if you   |
 |                      want different parts ordered differently,    |
 |                      which can be  achieved by issuing two or more|
 |                      consecutive calls to Qsort with LB= and HB=  |
 |                      specified accordingly (see Usage above).     |
 |-------------------------------------------------------------------|
 | M=         Optional  Tuning parm: The largest subpartition size   |
 |            Numeric   Quicksort attempts to partition further. Any |
 |                      subpartition LE &M is passed to the straight |
 |                      insertion sort. &M=1 corresponds to 'pure'   |
 |                      Quicksort working until all subpartitions    |
 |                      have been reduced to just 1 element. &M=9 is |
 |                      optimal. Variations from 5 to 15 affect the  |
 |                      sorting speed very slightly. Best advice as  |
 |                      to M= : Leave it alone at 9.                 |
 +===================================================================*/
 
  %Local _ H I J L N P Q S T W ;

  %Macro Swap (I,J) ;
    %Local W ;
     Do ;
       %Do W = 1 %To &N ;
         &T&W      = &&A&W(&I) ;
         &&A&W(&I) = &&A&W(&J) ;
         &&A&W(&J) =  &T&W     ;
       %End ;
    End ;
  %Mend Swap ;

  %If %Upcase(&Seq) = %Upcase(A) %Then %Let Q = G ;
  %Else                                %Let Q = L ;

  %Do %Until (&&A&N EQ ) ;
    %Let N  = %Eval(&N + 1) ;
    %Local A&N ;
    %Let A&N = %Scan(&Arr,&N,%Str( )) ;
  %End ;
  %Let N = %Eval(&N - 1) ;

  %Let _ = %Substr(%Sysfunc(Ranuni(0)),3,
  %Eval(7 - %Length(&N) + 5*(%Substr(&Sysver,1,1) GT 6))) ;

  %Let H = H&_ ; %Let I = I&_ ; %Let J = J&_ ; %Let L = L&_ ;
  %Let P = P&_ ; %Let S = S&_ ; %Let T = T&_ ; %Let Z = Z&_ ;

  Array &Z (0:1, 0:50) _Temporary_ ;

  &L = &LB ; &H = &HB ;

  If &H - &L GT &M Then Do &S=1 By 0 While (&S) ;
    &J = (&H - &L)/3 ; &I = &L + &J ; &J = &I + &J ;
    If &By(&L) &Q.T &By(&I) Then %Swap(&L,&I) ;
    If &By(&I) &Q.T &By(&J) Then %Swap(&I,&J) ;
    If &By(&J) &Q.T &By(&H) Then %Swap(&J,&H) ;
    If &By(&L) &Q.T &By(&I) Then %Swap(&L,&I) ;
    If &By(&I) &Q.T &By(&J) Then %Swap(&I,&J) ;
    If &By(&L) &Q.T &By(&I) Then %Swap(&L,&I) ;

    %If &M LE 3 %Then %Do ;
      If &H - &L LE 3 Then Do ;
        &L = &Z(0,&S) ; &H = &Z(1,&S) ; &S +- 1 ;
        Continue ;
      End ;
    %End ;

    %Swap(&L,&I) ; &P = &By(&L) ; &I = &L ;

    Do &J=&H + 1 By 0 ;
      Do &I=&I + 1 By  + 1 Until(&By(&I) &Q.E &P) ; End ;
      Do &J=&J - 1 By  - 1 Until(&P &Q.E &By(&J)) ; End ;
      If &I GE &J Then Leave ;
      %Swap(&I,&J) ;
    End ;

    %Swap(&L,&J) ;

    If      &H - &J GE &J - &L GT &M Then Do &S = &S + 1 ;
      &Z(0,&S) = &J + 1 ; &Z(1,&S) = &H ; &H = &J - 1 ;
    End ;
    Else If &J - &L GE &H - &J GT &M Then Do &S = &S + 1 ;
      &Z(0,&S) = &L ; &Z(1,&S) = &J - 1 ; &L = &J + 1 ;
    End ;
    Else If &J - &L GT &M GE &H - &J Then &H = &J - 1 ;
    Else If &H - &J GT &M GE &J - &L Then &L = &J + 1 ;
    Else Do ;
      &L = &Z(0,&S) ; &H = &Z(1,&S) ; &S +- 1 ;
    End ;
  End ;

  %If &M = 1 %Then %Goto Exit ;

  Do &J = &LB + 1 To &HB ;
    If &By(&J - 1) &Q.T &By(&J) Then Do ;
      &P = &By(&J) ;
      %Do W = 1 %To &N ;
        %If &&A&W Ne &By %Then &T&W = &&A&W(&J)  ;  ;
      %End ;
      Do &I = &J - 1 To &LB By  - 1 ;
        If &P &Q.E &By(&I) Then Leave ;
        %Do W = 1 %To &N ;
          &&A&W(&I + 1) = &&A&W(&I) ;
        %End ;
      End ;
      &By(&I + 1) = &P ;
      %Do W = 1 %To &N ;
        %If &&A&W Ne &By %Then &&A&W(&I + 1) = &T&W ; ;
      %End ;
    End ;
  End ;

  %Exit: Drop &H &I &J &L &P &S T&_: ;

%Mend Qsort ;

/*
data have;
 input v1-v11;
 cards4;
 0.004159 0.033589 0 0.277991 0.277991 0.024312 0 0.004479 0 0.050864 0.002239
 0.047345 0 0.098209 0.119962 0.164107 0.004159 0 0.004159 0.002239 0.004798 0.028791
 0 0.071337 0.12476 0.040627 0.089891 0.016955 0 0.042226 0 0 0.022713
 0.016635 0.147473 0.06238 0.051184 0.022393 0.206654 0.002559 0.004479 0.038708 0.121561 0
 0.277351 0 0.158989 0 0 0 0.056942 0.21849 0 0.011836 0.005758
 0.038708 0 0.119642 0 0.012476 0.039987 0.004479 0.00064 0.150992 0.117722 0.050544
 0.00032 0.050224 0 0 0 0.117083 0.135956 0.130838 0.241523 0.03199 0.022073
 0.00128 0.018874 0.029111 0.023992 0.088612 0 0.105886 0.045425 0.109085 0 0
 0.115483 0 0.024312 0.004159 0 0.089571 0.021433 0 0.024952 0.18682 0.026871
 0.003519 0 0 0.00096 0.34293 0.129559 0 0 0 0 0.003839
;;;;
run;

data want (keep=v1-v4 name1-name4);
  set have;
  array A(11) v1-v11;
  array N(11) $ name1-name11;
  do i=1 to dim(A);
    N(i)=vname(A(i));
  end;
  %Qsort (Arr=A N, By=A, Seq=D);
run;
*/

data _null_;
  call symput('nobs',nobs);
  stop;
  set have nobs=nobs;
run;

data want (drop=i);
  array point_var(&nobs.) _temporary_;
  array id_var(&nobs.) $ _temporary_;
  i=0;
  do until (eof);
    set have end = eof nobs=n;
    i+1;
    id_var(i)=id;
    point_var(i)=i;
  end;
  %Qsort (Arr=point_var id_var, By=id_var, Seq=A);
  do i=1 to n;
    x=point_var(i);
    set have point=x;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 302 views
  • 8 likes
  • 9 in conversation