Help using Base SAS procedures

Getting Top and Bottom 5 distinct values

Reply
Super Contributor
Posts: 398

Getting Top and Bottom 5 distinct values

I have 32 variables that I need to get the top 5 and bottom five distinct values. I'm not sure what would be the cleanest way to do this. Any help pointing me in the right direction would be greatly appreciated.

Thank you
Valued Guide
Posts: 2,175

Re: Getting Top and Bottom 5 distinct values

proc means
topic IDGROUP

see Example 12: Identifying the Top Three Extreme Values with the Output Statistics
at http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a001016379.htm added link to example Message was edited by: Peter.C
Respected Advisor
Posts: 3,777

Re: Getting Top and Bottom 5 distinct values

As mentioned the DISTINCT restriction means that to do this with PROC SUMMARY would require the data be summarized and then the top and bottom five chosen. With lots of levels of the analysis variables which are used as CLASS variables in the first step this would lose the performance race.

[pre]
%let vars=Age Height Weight;
proc summary data=sashelp.class chartype;
class &vars;
ways 1;
output out=freqs(rename=(_type_=type _freq_=N));
run;
proc print data=freqs;
run;
proc summary data=freqs;
by type;
freq N;
output out=bot idgroup(min(&vars) obs missing out[5](&vars N)=);
output out=top idgroup(max(&vars) obs missing out[5](&vars N)=);
run;
data five;
length position $3;
set bot top indsname=indsname;
by type;
position = scan(indsname,-1);
run;
proc print data=five;
run;
[/pre]
Frequent Contributor
Posts: 81

Re: Getting Top and Bottom 5 distinct values

Alternatively,

You may use proc sql or proc sort to select distinct values, order them by their key variables, and use point option in data step to select the records of your choice.

proc sql noprint;
create table two as select distinct * from one order by var1,var2;
select count(*) into :n from two;
quit;

data three;
do rcrd=1,2,3,4,5,(&n-4),(&n-3),(&n-2),(&n-1),&n;
set two point=rcrd;
output;
end;
stop;
run;
Regular Contributor
Posts: 171

Re: Getting Top and Bottom 5 distinct values

You could also easily transpose the dataset and use arrays to find the 5 smallest and largest. Here is an example with a made up dataset:

[pre]
data test;
array var{32};
do x=1 to 1000;
do i=-15 to 16;
var{i+16} = i*ranuni(0);
end;
output;
end;
drop i x;
run;

proc transpose data=test out=test_transpose name=var;
run;

data results;
set test_transpose;
array col{*} col:;
array max_{5};
array min_{5};
do i=1 to 5;
max_{i} = max(of col
  • );
    min_{i} = min(of col
  • );
    do j=1 to dim(col);
    if col{j}=min_{i} then col{j}= .;
    if col{j}=max_{i} then col{j}=.;
    end;
    end;
    keep var max_: min_:;
    run;
    [/pre]
  • Super Contributor
    Posts: 398

    Re: Getting Top and Bottom 5 distinct values

    Thank you everyone for your help.

    This is how I ended up doing it:

    proc sql outobs=5 ;
    create table temp_top5
    as
    select distinct variable1, 'TOP' as source
    from temp
    order by variable1
    ;
    quit ;

    I do the same for the bottom and it gave me what I was looking for. Thank you for all of your suggestions.
    Regular Contributor
    Posts: 171

    Re: Getting Top and Bottom 5 distinct values

    Are you planning to repeat those two SQL steps for all 32 variables in your dataset? Why use 64 steps to accomplish something that could be done so much more concisely? Or, maybe I just misunderstood your original posting.
    Valued Guide
    Posts: 2,175

    Re: Getting Top and Bottom 5 distinct values

    does that mean running once for each variable? 32times !

    If your input data set will fit in memory, before the 32 runs, do
    sasfile work.temp open ;
    Super User
    Posts: 3,105

    Re: Getting Top and Bottom 5 distinct values

    Here is another way of doing this:

    ods output ExtremeValues = extreme;

    proc univariate data = sashelp.class;
    var age;
    run;

    ods output close;

    The ODS statement creates a dataset EXTREME with the top and bottom 5 values.

    If you have more than one variable just list them in the VAR statement. Message was edited by: SASKiwi
    Regular Contributor
    Posts: 241

    Re: Getting Top and Bottom 5 distinct values

    Here is a one-pass solution that does not involve transposing variables. There are many ways to make my solution run even faster, but left as an exercise for those who are interested. :-)



       /* modified polingjw^s test data */


       data test;


          array var{32} var01-var32;


          do x=1 to 1e6;


             do i=-15 to 16;


                var{i+16} = i*ranuni(0);


             end;


             output;


          end;


          drop i x;


       run;


     


       /* polingjw */


       proc transpose data=test out=test_transpose name=var;


       run;


       data polingjw;


          set test_transpose;


          array col{*} col:;


          array max_{5};


          array min_{5};


          do i=1 to 5;


             max_{i} = max(of col

  • );

  •          min_{i} = min(of col

  • );

  •          do j=1 to dim(col);


                if col{j}=min_{i} then col{j}= .;


                if col{j}=max_{i} then col{j}=.;


             end;


          end;


          keep  var max_: min_:;


       run;


       /* re-shape to long for comparisons */


       data polingjwLong;


         set polingjw;


         length var $32 type $6 num value 8;


         keep var type num value;


         array max_(5) max_1 - max_5;


         array min_(5) min_1 - min_5;


         do num = 1 to 5;


            type = "top";


            value = max_(num);


            output;


         end;


         do num = 1 to 5;


            type = "bottom";


            value = min_(num);


            output;


         end;


       run;   


     


       /* chang */


       %let maxNumVars = 3000; 


       %let top = 5; 


     


       data chang;


          set test end=end;


     


          array top(1:&maxNumVars,-&top:&top) _temporary_;


          array cur(*) _numeric_;


          do k = 1 to dim(cur);


             if missing(cur(k)) then leave;


             link doTop;


             link doBottom;


          end;


     


          if end then link doOutput;


          return;


     


          doTop:


             do j = 1 to ⊤


                if missing(top(k,j)) then do;


                   top(k,j) = cur(k);


                   leave;


                end


                if (cur(k) < top(k,j)) then continue;


                if (cur(k) = top(k,j)) then leave;       


                do i = &top to j+1 by -1;


                   top(k, i) = top(k, i-1);


                end;


                top(k, j) = cur(k);


                leave;


             end


          return;


          


          doBottom:


             do j = 1 to &top;


                if missing(top(k,-j)) then do;


                   top(k,-j) = cur(k);


                   leave;


                end;


                if (cur(k) > top(k,-j)) then continue;


                if (cur(k) = top(k,-j)) then leave;


                do i = &top to j+1 by -1;


                   top(k,-i) = top(k,-i+1);


                end;


                top(k,-j) = cur(k);


                leave;


             end


          return;


     


          doOutput:


             length var $32 type $6;


             do k = 1 to dim(cur);


                var = vname(cur(k));


                type = "top";


                do num = 1 to &top;


                   value = top(k, num); 


                   output;


                end;


                type = "bottom";


                do num = 1 to &top;


                   value = top(k,-num);


                   output;


                end;


             end;


             keep var type num value;


          return;


       run


     


       proc compare base=polingjwLong comp=chang;


       run;


       /* on log, in part


       NOTE: No unequal values were found. All values compared are exactly equal.


       */

    Regular Contributor
    Posts: 171

    Re: Getting Top and Bottom 5 distinct values

    Chang,

    I’m impressed by your much more efficient program. In a dataset with one million observations, your program runs in about 1/4th the time that mine does. I actually thought about using a multi-dimensional array solution when I first read the original posting but quickly gave up once I started getting confused by my own program. Thanks for demonstrating the better path forward!
    Valued Guide
    Posts: 2,175

    Re: Getting Top and Bottom 5 distinct values

    polingjw and chang
    how does the array approach compare with the ods output approach from SASKIWI at http://support.sas.com/forums/thread.jspa?messageID=49801#49801

    peterC
    (an interested by-stander)
    Regular Contributor
    Posts: 171

    Re: Getting Top and Bottom 5 distinct values

    On my computer, Chang’s solution took 32.54 seconds, real time. The univariate procedure took 1:59.84, almost four times as long. However, the results are not equivalent. The original posting asked for the top five and bottom five distinct values. The univariate procedure does not produce distinct results.
    Valued Guide
    Posts: 2,175

    Re: Getting Top and Bottom 5 distinct values

    thank you
    that is a stunning comparison!
    Super User
    Posts: 9,681

    Re: Getting Top and Bottom 5 distinct values

    Hi.
    polingjw .
    You are wrong ,
    In the documentation about proc univariate ,there is an option which can select distinct top and bottom value.
    If you would ,can search it.
    proc univariate is more powerful than you image.


    Ksharp
    Ask a Question
    Discussion stats
    • 18 replies
    • 5702 views
    • 0 likes
    • 9 in conversation