DATA Step, Macro, Functions and more

Find highest values and output the variable name

Reply
Frequent Contributor
Posts: 78

Find highest values and output the variable name

I want to array a group of variables and output the 5 highest values into 5 new variables.  I know how to do this, but I don't know how to indicate which variable the highest values came from.  Does anyone know how this can be done?

Thanks in advance.

Valued Guide
Posts: 765

Re: Find highest values and output the variable name

Hi ... not sure how you deal with TIES (multiple variables with same value within an observation), but if there are not any

duplicate large values within an observation, this should work ... 

* some data;

data x;

array x(20);

do j=1 to 10;

do k= 1 to 20;

   x(k) = ceil(10000*ranuni(999));

end;

output;

end;

keep x1-x20;

run;

* use WHICHN and VNAME to get the variable names ... put into variables z1-z5);

data y;

set x;

array x(20);

array y(5);

array z(5) $32;

do j = 1 to 5;

   y(j) = largest(j, of xSmiley Happy;

   var = whichn(y(j),of xSmiley Happy;

   z(j) = vname(x(var));

end;

keep y: z:;

run;

y1      y2      y3      y4      y5     z1     z2     z3     z4     z5

8968    7881    7777    7443    6810    x16    x19    x1     x9     x5

9990    9956    9568    9384    9257    x9     x3     x18    x4     x16

9297    8066    7714    7355    7072    x11    x9     x10    x18    x4

9969    9903    8598    8386    8184    x10    x7     x1     x20    x18

9947    9512    9398    8740    7714    x11    x17    x18    x1     x4

9609    9528    8581    7187    7028    x10    x4     x9     x5     x7

9550    9403    8798    8024    5612    x9     x6     x12    x17    x2

9727    9605    9173    8484    8360    x14    x18    x13    x3     x2

9279    9091    9010    8755    8178    x2     x10    x6     x14    x17

9898    8788    8753    8444    8188    x13    x20    x16    x19    x17

I guess if there were ties , you could resort to  transposing the data and using PROC RANK.  You could

end up with more than 5 largest for each observation (or do something more elaborate with the above ARRAY

based code).


Valued Guide
Posts: 765

Re: Find highest values and output the variable name

Hi ... OK, upon a little more thought,  I think that this works with ties.  If there are multiple large values, this allows

up to 10 variables  (Y1-Y10) to hold the 5 largest values and variable names (Z1-Z10) ...

* some data ... restricted range of X1-X20 leads to some duplicate values within an observation;

data x;

array x(20);

do j=1 to 10;

do k= 1 to 20;

   x(k) = ceil(100*ranuni(999));

end;

output;

end;

keep x1-x20;

run;

data y;

set x;

array x(20);

array y(10);

array z(10) $32;

do j = 1 by 1 until(count eq 5 or j eq 10);

   y(j) = largest(j, of xSmiley Happy;

   var = whichn(y(j),of xSmiley Happy;

   z(j) = vname(x(var));

   if x(var) ne hold then do; count = sum(count,1); hold=x(var); x(var) = 1e6 ; end;

end;

keep y: z:;

run;



largest5.png
Frequent Contributor
Posts: 78

Re: Find highest values and output the variable name

Thank you so much, I will give this a try.

Super User
Posts: 10,516

Re: Find highest values and output the variable name

Is there going to be any particular rule about choosing one variable over another in the case of tied values?

Frequent Contributor
Posts: 78

Re: Find highest values and output the variable name

There are no rules, but I know there will be no more than 5 values that can be the same.  I'm actually looking for the top 3 values, but extended it to 5 in order to pick up a 5 way tie.

Ask a Question
Discussion stats
  • 5 replies
  • 934 views
  • 1 like
  • 3 in conversation