BookmarkSubscribeRSS Feed
JayCompany
Calcite | Level 5

Hi,

   I have the following table: Description, ID #,  A, B, C, D, E, F, G. I would like one table back with the top 10 highest values of each individual column arranged the same way, ie.  Description, ID #,  Top 10 in A, Top 10 in B....Top 10 in G. How would I go about doing this? The values in A, B, C, D, E, F, G are all integers. 

 


Regards,

JayCompany 

 

 

 

 

 

 

7 REPLIES 7
ballardw
Super User

@JayCompany wrote:

Hi,

   I have the following table: Description, ID #,  A, B, C, D, E, F, G. I would like one table back with the top 10 highest values of each individual column arranged the same way, ie.  Description, ID #,  Top 10 in A, Top 10 in B....Top 10 in G. How would I go about doing this? The values in A, B, C, D, E, F, G are all integers. 

 


Show some actual example of starting data and what the result should look like. Likely only need to show the results for 2 variables.

I am not sure I am visualizing your desired result very clearly

 

One problem: "top 10" is not a very specific description. "top" could be the largest or the smallest depending on a ranking scale.

Also, repeat values might be of concern since we can't tell if you want ties or not.

 

Astounding
PROC Star

How could there be just one ID column in the final table?  Would that mean that the same ID has the top value for all variables?

JayCompany
Calcite | Level 5

No, the same ID doesn't have to have the top value for all variables..I just want a list of all the IDS that have the top variables for each of the different columns.

JayCompany
Calcite | Level 5

Okay, so for example:

 

Here's what I have right now:

 

Description, ID #,  A, B

Cars, X123, 500, 1000

Cars, X124, 600, 2000

Cars, X125, 200, 500

Cars, X126, 800, 6000

 

 

I'd like this in return,  Highest 3 values from each variable A and B:

 

Description, ID #,  A

Cars, X126, 800

Cars, X124, 600

Cars, X123, 500

 

Description, ID #, B

Cars, X126, 6000

Cars, X124, 2000

Cars, X123, 1000

 

But Ideally, combined together into one Table.

 

Thanks!

 

 

 

 

 

 

PGStats
Opal | Level 21

Here is one of many ways to achieve this:

 

data have;
input Desc $ ID $ A B;
datalines;
A X123 500 1000
A X124 600 2000
A X125 200 500
A X126 800 6000
;

proc transpose data=have out=temp name=var;
by Desc id;
var a--b;
run;

proc sort data=temp; by Desc var descending col1; run;

data want; 
set temp; by desc var;
if first.var then count=0;
count + 1;
if count <= 3;
drop count;
rename col1=value;
run;

proc print noobs; run;
PG
s_lassen
Meteorite | Level 14

One possibility is to use hash tables, you basically make an ordered hash table for each variable, putting your values in until the number you want is reached, and then repeatedly deleting the lowest (first) value.

 

To keep track of the hash tables you create another hash table(TREE) , with the variable name (VAR) as key.

 

Here is an example which finds the 3 highest values of A and B in you example data:

%let n=3; /* rank that we want */
%let vars='A','B';  /* variables that we want */
%let id='Desc','ID'; /* id variable(s) */
data want;
  set have end=done;
  if _N_=1 then do;
    declare hash h;
    declare hiter i;
    length var $32 value $50;
    retain order 'A';  /* Ascending/Descending, A means highest values, D lowest */
    declare hash tree();
    tree.definekey('var');
    tree.definedata('h','i');
    tree.definedone();
    do var=&vars;
      h=_new_ hash(multidata: 'Y', ordered: order);
      h.definekey(var);
      h.definedata(var,&id);
      h.definedone();
      i=_new_ hiter('h');
      rc=tree.add();
      end;
    end;
  do var=&vars;
    rc=tree.find();
    rc=h.add();
    if h.num_items>&n then do;
      rc=i.first();  /* get item to delete */
      rc=i.prev(); /* release iterator so that we can remove item */
      rc=h.remove();
      end;
    if done then do;
      rc=i.last();  /* get highest/lowest item */
      do rank=1 by 1 until(i.prev());
        value=vvaluex(var); /* get formatted value */
        output;
        end;
      end;
    end;
  keep desc id var value order rank;
run;

 I use the VVALUEX function to return formatted values. That way you can have the character representations of different variable types (e.g. character, money, integer, date) in the same character variable. If you need to find lowest values, change the ORDER variable to D.

PaigeMiller
Diamond | Level 26

You could run PROC RANK one time on all variables of interest.

 

Then it's a simple sort of the ranks to give the top N in each column.

--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 3109 views
  • 1 like
  • 6 in conversation