I want to extract the name from a variable based on the highest value in the fields.
here is a sample of the data
id wht_rev blk_rev asn_rev iph_rev
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
so i want to create a variable called ethnic based on the highest value from each row of the data set. the result should look like
id ethnic
b1 wht
b2 blk
b4 blk
I am not sure where to begin at all
Try this.
data have;
input id $ wht_rev blk_rev asn_rev iph_rev;
datalines;
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
;
data want(keep = id ethnic);
set have;
array a wht_rev -- iph_rev;
idx = whichn(max(of a[*]), of a[*]);
ethnic = scan(vname(a[idx]), 1, '_');
run;
Result:
id ethnic b1 wht b2 blk b4 blk
data have;
input id $ wht_rev blk_rev asn_rev iph_rev;
cards;
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
;
data want;
set have;
length want $32;
array t(*) wht_rev--iph_rev;
max=max(of t(*)); /*keep the max value in dataset*/
want=scan(vname(t(whichn(max,of t(*)))),1,'_');
run;
Please feel free to add a KEEP or DROP statement to choose what variables you need to remain or drop 🙂
HI @bibbnd Please be aware WHICHN is not really the greatest function if you have a very very wide dataset. That doesn't quite do the binary search as we would like. At least in my little experience with what I have tested so far, it's rather slow when we have to deal with very wide datasets.
Sometimes, cycling through the list is far cleaner, easier and surprisingly faster in my experience. So-
data have;
input id $ wht_rev blk_rev asn_rev iph_rev;
cards;
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
;
data want;
set have;
array t(*) wht_rev--iph_rev;
max=max(of t(*));
length want $32;
do _n_=1 to dim(t);
if t(_n_)=max then do;
want=scan(vname(t(_n_)),1,'_');
leave;
end;
end;
run;
Agree, it's boring but it's convenient.
Try this.
data have;
input id $ wht_rev blk_rev asn_rev iph_rev;
datalines;
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
;
data want(keep = id ethnic);
set have;
array a wht_rev -- iph_rev;
idx = whichn(max(of a[*]), of a[*]);
ethnic = scan(vname(a[idx]), 1, '_');
run;
Result:
id ethnic b1 wht b2 blk b4 blk
If you don't like data steps, you could use proc transpose + proc sort (twice):
proc transpose data=have out=transposed name=ethnic;
by id;
var _numeric_;
run;
proc sort data=transposed out=sorted;
by id descending col1;
run;
proc sort data=sorted out=want(drop=col1) nodupkey;
by id;
run;
EDIT: Processing data thrice is, of course, less efficient than then data-step-solution already provided 😉
@andreas_lds My SQL colleagues/folks here sitting besides me would rather have your solution as opposed to the rest. Not kidding, honestly here it seems the focus is ONLY credit risk loan performance and appetite and not SAS efficiency for some reason.
Rule: What is simple that produces results is business for us. So Kudos!!
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.