Solved: Re: Select a column with the highest value

bibbnd · Posted 01-20-2021 08:43 AM

I want to extract the name from a variable based on the highest value in the fields.

here is a sample of the data

id wht_rev blk_rev asn_rev iph_rev

b1 265 114 34 7

b2 65 218 5 22

b4 78 101 12 4

so i want to create a variable called ethnic based on the highest value from each row of the data set. the result should look like

id ethnic

b1 wht

b2 blk

b4 blk

I am not sure where to begin at all

PeterClemmensen · Posted 01-20-2021 08:55 AM

Try this.

data have;
input id $ wht_rev blk_rev asn_rev iph_rev;
datalines;
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
;

data want(keep = id ethnic);
   set have;
   array a wht_rev -- iph_rev;
   idx = whichn(max(of a[*]), of a[*]);
   ethnic = scan(vname(a[idx]), 1, '_');
run;

Result:

id  ethnic 
b1  wht 
b2  blk 
b4  blk

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

novinosrin · Posted 01-20-2021 08:52 AM


data have;
input   id $  wht_rev  blk_rev  asn_rev iph_rev;
cards;
   b1    265        114          34        7

   b2    65          218         5          22

   b4    78          101        12          4
   ;

data want;
 set have;
 length want $32;
 array t(*) wht_rev--iph_rev;
 max=max(of t(*)); /*keep the max value in dataset*/
 want=scan(vname(t(whichn(max,of t(*)))),1,'_');
run;

Please feel free to add a KEEP or DROP statement to choose what variables you need to remain or drop 🙂

novinosrin · Posted 01-20-2021 09:08 AM

HI @bibbnd Please be aware WHICHN is not really the greatest function if you have a very very wide dataset. That doesn't quite do the binary search as we would like. At least in my little experience with what I have tested so far, it's rather slow when we have to deal with very wide datasets.

Sometimes, cycling through the list is far cleaner, easier and surprisingly faster in my experience. So-


data have;
input   id $  wht_rev  blk_rev  asn_rev iph_rev;
cards;
   b1    265        114          34        7

   b2    65          218         5          22

   b4    78          101        12          4
   ;

data want;
 set have;
 array t(*) wht_rev--iph_rev;
 max=max(of t(*)); 
 length want $32;
 do _n_=1 to dim(t);
  if t(_n_)=max then do;
    want=scan(vname(t(_n_)),1,'_');
	leave;
  end;
 end;
run;

Agree, it's boring but it's convenient.

PeterClemmensen · Posted 01-20-2021 08:55 AM

Try this.

data have;
input id $ wht_rev blk_rev asn_rev iph_rev;
datalines;
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
;

data want(keep = id ethnic);
   set have;
   array a wht_rev -- iph_rev;
   idx = whichn(max(of a[*]), of a[*]);
   ethnic = scan(vname(a[idx]), 1, '_');
run;

Result:

id  ethnic 
b1  wht 
b2  blk 
b4  blk

The DATA to DATA Step Macro
Blog: SASnrd

andreas_lds · Posted 01-20-2021 10:00 AM

If you don't like data steps, you could use proc transpose + proc sort (twice):

proc transpose data=have out=transposed name=ethnic;
	by id;
	var _numeric_;
run;

proc sort data=transposed out=sorted;
	by id descending col1;
run;

proc sort data=sorted out=want(drop=col1) nodupkey;
	by id;
run;

EDIT: Processing data thrice is, of course, less efficient than then data-step-solution already provided 😉

novinosrin · Posted 01-20-2021 10:07 AM

@andreas_lds My SQL colleagues/folks here sitting besides me would rather have your solution as opposed to the rest. Not kidding, honestly here it seems the focus is ONLY credit risk loan performance and appetite and not SAS efficiency for some reason.

Rule: What is simple that produces results is business for us. So Kudos!!

SAS Innovate 2025: Call for Content

Classroom Training Available!