Is there a way to pull the variable column header based on the lowest value within a dataset? Such as:
Have1 | Have2 | Have3 | Want |
100 | 200 | 250 | Have1 |
200 | 100 | 275 | Have2 |
300 | 900 | 100 | Have3 |
250 | 25 | 120 | Have2 |
50 | 8 | 65 | Have2 |
400 | 700 | 200 | Have3 |
I have a very large number of variable to compare. Thanks in advance.
It sounds like you want to find the NAME of the variable with the smallest value.
In your case you want to search the variables named HAVE1 - HAVE3 .
You probably will want to define an array and then use the MIN(), WHICHN() and VNAME() functions.
First let's convert your LISTING into an actual DATASET.
data have;
input Have1-Have3 Expect :$32.;
cards;
100 200 250 Have1
200 100 275 Have2
300 900 100 Have3
250 25 120 Have2
50 8 65 Have2
400 700 200 Have3
;
Now we can run this data step to create the WANT variable.
data want;
set have;
array x have1-have3;
Want=vname(x[whichn(min(of x[*]),of x[*])]);
run;
Results:
Obs Have1 Have2 Have3 Expect Want 1 100 200 250 Have1 Have1 2 200 100 275 Have2 Have2 3 300 900 100 Have3 Have3 4 250 25 120 Have2 Have2 5 50 8 65 Have2 Have2 6 400 700 200 Have3 Have3
It sounds like you want to find the NAME of the variable with the smallest value.
In your case you want to search the variables named HAVE1 - HAVE3 .
You probably will want to define an array and then use the MIN(), WHICHN() and VNAME() functions.
First let's convert your LISTING into an actual DATASET.
data have;
input Have1-Have3 Expect :$32.;
cards;
100 200 250 Have1
200 100 275 Have2
300 900 100 Have3
250 25 120 Have2
50 8 65 Have2
400 700 200 Have3
;
Now we can run this data step to create the WANT variable.
data want;
set have;
array x have1-have3;
Want=vname(x[whichn(min(of x[*]),of x[*])]);
run;
Results:
Obs Have1 Have2 Have3 Expect Want 1 100 200 250 Have1 Have1 2 200 100 275 Have2 Have2 3 300 900 100 Have3 Have3 4 250 25 120 Have2 Have2 5 50 8 65 Have2 Have2 6 400 700 200 Have3 Have3
This is great. But naturally I've over simplified my example. My data would be more representative of:
data work.have;
input Alabama Arkansas Indiana;
datalines;
100 200 250
200 100 275
300 900 100
250 25 120
50 8 65
400 700 200
;
run;
Different data structure should only require a change to the ARRAY statement, to list the variables in the array.
Or if you can put all variables in the array, you could try something like:
array x _numeric_ ;
The idea of having variables names Alabama, Arkansas and Indiana seems opposite of the way SAS was designed to be most efficient. There's a reason why people say Long beats Wide, and normally it is because there is much less programming needed if your data is arrange properly, and probably the code will execute faster. In addition, people say data belongs in SAS variable values, not in variable names.
So here's a long data set with the same data, and with the state names as data instead of variable names. Note that you do not have to type in the variable names Alabama, Arkansas and Indiana into your code, and if there are other states in your data, the exact same code works.
data have;
input id value state $;
cards;
1 100 AL
1 200 AR
1 250 IN
2 200 AL
2 100 AR
2 275 IN
3 300 AL
3 900 AR
3 100 IN
4 250 AL
4 25 AR
4 120 IN
5 50 AL
5 8 AR
5 65 IN
6 400 AL
6 700 AR
6 200 IN
;
proc summary data=have nway;
class id;
var value;
output out=min min=minvalue minid(value(state))=min_state;
run;
Of course, in this very simple example, maybe the benefit isn't obvious, but in real world problems, long does beat wide, it works better in SAS because most SAS PROCs are designed to work on long data sets, and it will be less programming.
Just change the list of variables in the ARRAY statement.
If you know they all appear together in the dataset then you could use a positional variable list (what the documentation calls a "name range list") as opposed the simple variable list that works with numeric suffixed names (what the documentation calls a "numbered range variable list"). Run PROC CONTENTS with the VARNUM option to see the variable names listed in order of where they appear in the dataset.
So if the variables are in order from Alaska to Wyoming then the array statement could simply be:
array x Alaska -- Wyoming ;
Think about how you will handle ties, if ties are possible.
I think Tom's lovely solution would return the first variable in the array found by WHICHN.
Another solution is using SAS/IML , if you have it.
data have;
input Have1-Have3 Expect :$32.;
cards;
100 200 250 Have1
200 100 275 Have2
300 900 100 Have3
250 25 120 Have2
50 8 65 Have2
400 700 200 Have3
;
proc iml;
use have;
read all var _num_ into x[c=vname];
close;
want=vname[x[,>:<]];
create want from x want[c=(vname||'want')];
append from x want;
close;
quit;
Hi Folks!,
I hope this helps!.
This can be done using SAS arrays easily. Below code works fine to find the header of lowest value in a list.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.