- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want to extract the name from a variable based on the highest value in the fields.
here is a sample of the data
id wht_rev blk_rev asn_rev iph_rev
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
so i want to create a variable called ethnic based on the highest value from each row of the data set. the result should look like
id ethnic
b1 wht
b2 blk
b4 blk
I am not sure where to begin at all
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this.
data have;
input id $ wht_rev blk_rev asn_rev iph_rev;
datalines;
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
;
data want(keep = id ethnic);
set have;
array a wht_rev -- iph_rev;
idx = whichn(max(of a[*]), of a[*]);
ethnic = scan(vname(a[idx]), 1, '_');
run;
Result:
id ethnic b1 wht b2 blk b4 blk
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input id $ wht_rev blk_rev asn_rev iph_rev;
cards;
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
;
data want;
set have;
length want $32;
array t(*) wht_rev--iph_rev;
max=max(of t(*)); /*keep the max value in dataset*/
want=scan(vname(t(whichn(max,of t(*)))),1,'_');
run;
Please feel free to add a KEEP or DROP statement to choose what variables you need to remain or drop 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
HI @bibbnd Please be aware WHICHN is not really the greatest function if you have a very very wide dataset. That doesn't quite do the binary search as we would like. At least in my little experience with what I have tested so far, it's rather slow when we have to deal with very wide datasets.
Sometimes, cycling through the list is far cleaner, easier and surprisingly faster in my experience. So-
data have;
input id $ wht_rev blk_rev asn_rev iph_rev;
cards;
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
;
data want;
set have;
array t(*) wht_rev--iph_rev;
max=max(of t(*));
length want $32;
do _n_=1 to dim(t);
if t(_n_)=max then do;
want=scan(vname(t(_n_)),1,'_');
leave;
end;
end;
run;
Agree, it's boring but it's convenient.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this.
data have;
input id $ wht_rev blk_rev asn_rev iph_rev;
datalines;
b1 265 114 34 7
b2 65 218 5 22
b4 78 101 12 4
;
data want(keep = id ethnic);
set have;
array a wht_rev -- iph_rev;
idx = whichn(max(of a[*]), of a[*]);
ethnic = scan(vname(a[idx]), 1, '_');
run;
Result:
id ethnic b1 wht b2 blk b4 blk
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you don't like data steps, you could use proc transpose + proc sort (twice):
proc transpose data=have out=transposed name=ethnic;
by id;
var _numeric_;
run;
proc sort data=transposed out=sorted;
by id descending col1;
run;
proc sort data=sorted out=want(drop=col1) nodupkey;
by id;
run;
EDIT: Processing data thrice is, of course, less efficient than then data-step-solution already provided 😉
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@andreas_lds My SQL colleagues/folks here sitting besides me would rather have your solution as opposed to the rest. Not kidding, honestly here it seems the focus is ONLY credit risk loan performance and appetite and not SAS efficiency for some reason.
Rule: What is simple that produces results is business for us. So Kudos!!