## Getting Top and Bottom 5 distinct values

Super Contributor
Posts: 405

# Getting Top and Bottom 5 distinct values

I have 32 variables that I need to get the top 5 and bottom five distinct values. I'm not sure what would be the cleanest way to do this. Any help pointing me in the right direction would be greatly appreciated.

Thank you
Valued Guide
Posts: 2,191

## Re: Getting Top and Bottom 5 distinct values

proc means
topic IDGROUP

see Example 12: Identifying the Top Three Extreme Values with the Output Statistics
Posts: 3,852

## Re: Getting Top and Bottom 5 distinct values

As mentioned the DISTINCT restriction means that to do this with PROC SUMMARY would require the data be summarized and then the top and bottom five chosen. With lots of levels of the analysis variables which are used as CLASS variables in the first step this would lose the performance race.

[pre]
%let vars=Age Height Weight;
proc summary data=sashelp.class chartype;
class &vars;
ways 1;
output out=freqs(rename=(_type_=type _freq_=N));
run;
proc print data=freqs;
run;
proc summary data=freqs;
by type;
freq N;
output out=bot idgroup(min(&vars) obs missing out[5](&vars N)=);
output out=top idgroup(max(&vars) obs missing out[5](&vars N)=);
run;
data five;
length position \$3;
set bot top indsname=indsname;
by type;
position = scan(indsname,-1);
run;
proc print data=five;
run;
[/pre]
Frequent Contributor
Posts: 81

## Re: Getting Top and Bottom 5 distinct values

Alternatively,

You may use proc sql or proc sort to select distinct values, order them by their key variables, and use point option in data step to select the records of your choice.

proc sql noprint;
create table two as select distinct * from one order by var1,var2;
select count(*) into :n from two;
quit;

data three;
do rcrd=1,2,3,4,5,(&n-4),(&n-3),(&n-2),(&n-1),&n;
set two point=rcrd;
output;
end;
stop;
run;
Regular Contributor
Posts: 171

## Re: Getting Top and Bottom 5 distinct values

You could also easily transpose the dataset and use arrays to find the 5 smallest and largest. Here is an example with a made up dataset:

[pre]
data test;
array var{32};
do x=1 to 1000;
do i=-15 to 16;
var{i+16} = i*ranuni(0);
end;
output;
end;
drop i x;
run;

proc transpose data=test out=test_transpose name=var;
run;

data results;
set test_transpose;
array col{*} col:;
array max_{5};
array min_{5};
do i=1 to 5;
max_{i} = max(of col
• );
min_{i} = min(of col
• );
do j=1 to dim(col);
if col{j}=min_{i} then col{j}= .;
if col{j}=max_{i} then col{j}=.;
end;
end;
keep var max_: min_:;
run;
[/pre]
• Super Contributor
Posts: 405

## Re: Getting Top and Bottom 5 distinct values

Thank you everyone for your help.

This is how I ended up doing it:

proc sql outobs=5 ;
create table temp_top5
as
select distinct variable1, 'TOP' as source
from temp
order by variable1
;
quit ;

I do the same for the bottom and it gave me what I was looking for. Thank you for all of your suggestions.
Regular Contributor
Posts: 171

## Re: Getting Top and Bottom 5 distinct values

Are you planning to repeat those two SQL steps for all 32 variables in your dataset? Why use 64 steps to accomplish something that could be done so much more concisely? Or, maybe I just misunderstood your original posting.
Valued Guide
Posts: 2,191

## Re: Getting Top and Bottom 5 distinct values

does that mean running once for each variable? 32times !

If your input data set will fit in memory, before the 32 runs, do
sasfile work.temp open ;
Super User
Posts: 3,918

## Re: Getting Top and Bottom 5 distinct values

Here is another way of doing this:

ods output ExtremeValues = extreme;

proc univariate data = sashelp.class;
var age;
run;

ods output close;

The ODS statement creates a dataset EXTREME with the top and bottom 5 values.

If you have more than one variable just list them in the VAR statement. Message was edited by: SASKiwi
Regular Contributor
Posts: 241

## Re: Getting Top and Bottom 5 distinct values

Here is a one-pass solution that does not involve transposing variables. There are many ways to make my solution run even faster, but left as an exercise for those who are interested. :-)

/* modified polingjw^s test data */

data test;

array var{32} var01-var32;

do x=1 to 1e6;

do i=-15 to 16;

var{i+16} = i*ranuni(0);

end;

output;

end;

drop i x;

run;

/* polingjw */

proc transpose data=test out=test_transpose name=var;

run;

data polingjw;

set test_transpose;

array col{*} col:;

array max_{5};

array min_{5};

do i=1 to 5;

max_{i} = max(of col

• );

•          min_{i} = min(of col

• );

•          do j=1 to dim(col);

if col{j}=min_{i} then col{j}= .;

if col{j}=max_{i} then col{j}=.;

end;

end;

keep  var max_: min_:;

run;

/* re-shape to long for comparisons */

data polingjwLong;

set polingjw;

length var \$32 type \$6 num value 8;

keep var type num value;

array max_(5) max_1 - max_5;

array min_(5) min_1 - min_5;

do num = 1 to 5;

type = "top";

value = max_(num);

output;

end;

do num = 1 to 5;

type = "bottom";

value = min_(num);

output;

end;

run;

/* chang */

%let maxNumVars = 3000;

%let top = 5;

data chang;

set test end=end;

array top(1:&maxNumVars,-&top:&top) _temporary_;

array cur(*) _numeric_;

do k = 1 to dim(cur);

if missing(cur(k)) then leave;

end;

return;

doTop:

do j = 1 to &top;

if missing(top(k,j)) then do;

top(k,j) = cur(k);

leave;

end

if (cur(k) < top(k,j)) then continue;

if (cur(k) = top(k,j)) then leave;

do i = &top to j+1 by -1;

top(k, i) = top(k, i-1);

end;

top(k, j) = cur(k);

leave;

end

return;

doBottom:

do j = 1 to &top;

if missing(top(k,-j)) then do;

top(k,-j) = cur(k);

leave;

end;

if (cur(k) > top(k,-j)) then continue;

if (cur(k) = top(k,-j)) then leave;

do i = &top to j+1 by -1;

top(k,-i) = top(k,-i+1);

end;

top(k,-j) = cur(k);

leave;

end

return;

doOutput:

length var \$32 type \$6;

do k = 1 to dim(cur);

var = vname(cur(k));

type = "top";

do num = 1 to &top;

value = top(k, num);

output;

end;

type = "bottom";

do num = 1 to &top;

value = top(k,-num);

output;

end;

end;

keep var type num value;

return;

run

proc compare base=polingjwLong comp=chang;

run;

/* on log, in part

NOTE: No unequal values were found. All values compared are exactly equal.

*/

Regular Contributor
Posts: 171

## Re: Getting Top and Bottom 5 distinct values

Chang,

I’m impressed by your much more efficient program. In a dataset with one million observations, your program runs in about 1/4th the time that mine does. I actually thought about using a multi-dimensional array solution when I first read the original posting but quickly gave up once I started getting confused by my own program. Thanks for demonstrating the better path forward!
Valued Guide
Posts: 2,191

## Re: Getting Top and Bottom 5 distinct values

polingjw and chang
how does the array approach compare with the ods output approach from SASKIWI at http://support.sas.com/forums/thread.jspa?messageID=49801#49801

peterC
(an interested by-stander)
Regular Contributor
Posts: 171

## Re: Getting Top and Bottom 5 distinct values

On my computer, Chang’s solution took 32.54 seconds, real time. The univariate procedure took 1:59.84, almost four times as long. However, the results are not equivalent. The original posting asked for the top five and bottom five distinct values. The univariate procedure does not produce distinct results.
Valued Guide
Posts: 2,191

## Re: Getting Top and Bottom 5 distinct values

thank you
that is a stunning comparison!
Super User
Posts: 10,770

## Re: Getting Top and Bottom 5 distinct values

Hi.
polingjw .
You are wrong ,
In the documentation about proc univariate ,there is an option which can select distinct top and bottom value.
If you would ,can search it.
proc univariate is more powerful than you image.

Ksharp
Discussion stats
• 18 replies
• 7907 views
• 0 likes
• 9 in conversation