Dear SAS experts,
I'm wondering if there is an easy way to use the value a variable to directly refer to other variables.
For example, given the data set below
data have;
input varIndex var_1 var_2 var_3 var_4;
cards;
1 . . . .
3 . . . .
2 . . . .
;
run;
I would like to assign 1 to var_i where i is given by the value of varIndex such that the desired output will be
data want;
input varIndex var_1 var_2 var_3 var_4;
cards;
1 1 . . .
3 . . 1 .
2 . 1 . .
;
run;
I've tried
data want;
set have;
array varArray (4) var_1 - var_4;
do i = 1 to 4;
if i = varIndex then varArray(i) = 1;
end;
run;
and it worked.
I'm wondering instead of using a do loop to search for the variable I want to refer, is there a more efficient and direct way (e.g. if I have var_1 - var_100000) to achieve that?
Thanks for your help.
Loop once and retain with temp array
data have;
input varName$ a b c d;
cards;
a . . . .
c . . . .
d . . . .
;
run;
data want;
set have;
array t(*) a--d;
array j(4) $ _temporary_;
if _n_=1 then
do i=1 to dim(t);
j(i)=vname(t(i));
end;
k=whichc(varname,of j(*));
t(k)=1;
drop k i;
run;
You don't need the loop, you can simplify your code to
data have;
input varIndex var_1 var_2 var_3 var_4;
cards;
1 . . . .
3 . . . .
2 . . . .
;
run;
data want;
set have;
array varArray (4) var_1 - var_4;
varArray[varIndex] = 1;
run;
Loop once and retain with temp array
data have;
input varName$ a b c d;
cards;
a . . . .
c . . . .
d . . . .
;
run;
data want;
set have;
array t(*) a--d;
array j(4) $ _temporary_;
if _n_=1 then
do i=1 to dim(t);
j(i)=vname(t(i));
end;
k=whichc(varname,of j(*));
t(k)=1;
drop k i;
run;
Use a simple transpose:
data have;
input varName$ a b c d;
cards;
a . . . .
c . . . .
d . . . .
;
run;
data intermed;
set have;
value = 1;
num = _n_;
run;
proc transpose data=intermed out=want (drop=_name_ num);
var value;
id varname;
by num;
copy varname;
run;
I wonder how that's efficient if OP's stated objective is efficiency?
Define "efficiency".
For me, code that is completely data-driven, and needs no changes when incoming data changes, is most efficient.
Depending on the way the data is created in the first place, the intermediate step might not be necessary at all. What is important here is that the wanted action is basically a transpose.
I tested your code with a very very large dataset at my insurance company where we deal with humongous amounts of data and it's much slower than the array based solution I am afraid.
@Andygray wrote:
I tested your code with a very very large dataset at my insurance company where we deal with humongous amounts of data and it's much slower than the array based solution I am afraid.
No problem with that. Maxim 29 leads to the simplest and easiest to maintain solution, but if following Maxim 30 reveals a real bottleneck, the more customized solution will make sense.
@Andygray wrote:
I tested your code with a very very large dataset at my insurance company where we deal with humongous amounts of data and it's much slower than the array based solution I am afraid.
It might help to clarify this comment by including the example code as it may not be obvious which you mean.
Quite often Proc Transpose is not a rapid solution for processes involving "humongous" data. So I am guessing that this refers to the transpose approach.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.