DATA Step, Macro, Functions and more

Extract Top 10 Records of each column in an original Table.

Reply
Occasional Contributor
Posts: 8

Extract Top 10 Records of each column in an original Table.

Hi,

   I have the following table: Description, ID #,  A, B, C, D, E, F, G. I would like one table back with the top 10 highest values of each individual column arranged the same way, ie.  Description, ID #,  Top 10 in A, Top 10 in B....Top 10 in G. How would I go about doing this? The values in A, B, C, D, E, F, G are all integers. 

 


Regards,

JayCompany 

 

 

 

 

 

 

Super User
Posts: 13,941

Re: Extract Top 10 Records of each column in an original Table.

Posted in reply to JayCompany

JayCompany wrote:

Hi,

   I have the following table: Description, ID #,  A, B, C, D, E, F, G. I would like one table back with the top 10 highest values of each individual column arranged the same way, ie.  Description, ID #,  Top 10 in A, Top 10 in B....Top 10 in G. How would I go about doing this? The values in A, B, C, D, E, F, G are all integers. 

 


Show some actual example of starting data and what the result should look like. Likely only need to show the results for 2 variables.

I am not sure I am visualizing your desired result very clearly

 

One problem: "top 10" is not a very specific description. "top" could be the largest or the smallest depending on a ranking scale.

Also, repeat values might be of concern since we can't tell if you want ties or not.

 

Super User
Posts: 6,935

Re: Extract Top 10 Records of each column in an original Table.

Posted in reply to JayCompany

How could there be just one ID column in the final table?  Would that mean that the same ID has the top value for all variables?

Occasional Contributor
Posts: 8

Re: Extract Top 10 Records of each column in an original Table.

Posted in reply to Astounding

No, the same ID doesn't have to have the top value for all variables..I just want a list of all the IDS that have the top variables for each of the different columns.

Occasional Contributor
Posts: 8

Re: Extract Top 10 Records of each column in an original Table.

Posted in reply to JayCompany

Okay, so for example:

 

Here's what I have right now:

 

Description, ID #,  A, B

Cars, X123, 500, 1000

Cars, X124, 600, 2000

Cars, X125, 200, 500

Cars, X126, 800, 6000

 

 

I'd like this in return,  Highest 3 values from each variable A and B:

 

Description, ID #,  A

Cars, X126, 800

Cars, X124, 600

Cars, X123, 500

 

Description, ID #, B

Cars, X126, 6000

Cars, X124, 2000

Cars, X123, 1000

 

But Ideally, combined together into one Table.

 

Thanks!

 

 

 

 

 

 

Esteemed Advisor
Posts: 5,626

Re: Extract Top 10 Records of each column in an original Table.

Posted in reply to JayCompany

Here is one of many ways to achieve this:

 

data have;
input Desc $ ID $ A B;
datalines;
A X123 500 1000
A X124 600 2000
A X125 200 500
A X126 800 6000
;

proc transpose data=have out=temp name=var;
by Desc id;
var a--b;
run;

proc sort data=temp; by Desc var descending col1; run;

data want; 
set temp; by desc var;
if first.var then count=0;
count + 1;
if count <= 3;
drop count;
rename col1=value;
run;

proc print noobs; run;
PG
PROC Star
Posts: 277

Re: Extract Top 10 Records of each column in an original Table.

Posted in reply to JayCompany

One possibility is to use hash tables, you basically make an ordered hash table for each variable, putting your values in until the number you want is reached, and then repeatedly deleting the lowest (first) value.

 

To keep track of the hash tables you create another hash table(TREE) , with the variable name (VAR) as key.

 

Here is an example which finds the 3 highest values of A and B in you example data:

%let n=3; /* rank that we want */
%let vars='A','B';  /* variables that we want */
%let id='Desc','ID'; /* id variable(s) */
data want;
  set have end=done;
  if _N_=1 then do;
    declare hash h;
    declare hiter i;
    length var $32 value $50;
    retain order 'A';  /* Ascending/Descending, A means highest values, D lowest */
    declare hash tree();
    tree.definekey('var');
    tree.definedata('h','i');
    tree.definedone();
    do var=&vars;
      h=_new_ hash(multidata: 'Y', ordered: order);
      h.definekey(var);
      h.definedata(var,&id);
      h.definedone();
      i=_new_ hiter('h');
      rc=tree.add();
      end;
    end;
  do var=&vars;
    rc=tree.find();
    rc=h.add();
    if h.num_items>&n then do;
      rc=i.first();  /* get item to delete */
      rc=i.prev(); /* release iterator so that we can remove item */
      rc=h.remove();
      end;
    if done then do;
      rc=i.last();  /* get highest/lowest item */
      do rank=1 by 1 until(i.prev());
        value=vvaluex(var); /* get formatted value */
        output;
        end;
      end;
    end;
  keep desc id var value order rank;
run;

 I use the VVALUEX function to return formatted values. That way you can have the character representations of different variable types (e.g. character, money, integer, date) in the same character variable. If you need to find lowest values, change the ORDER variable to D.

Respected Advisor
Posts: 3,279

Re: Extract Top 10 Records of each column in an original Table.

You could run PROC RANK one time on all variables of interest.

 

Then it's a simple sort of the ranks to give the top N in each column.

--
Paige Miller
Ask a Question
Discussion stats
  • 7 replies
  • 168 views
  • 1 like
  • 6 in conversation