How to append SAS Variables in the sample dataset below: declaring array, transposing and transposing back to one seems to work, but seems tedious and inefficient..Any efficent method plz?
Have:/*what if there are too many variables? hence need efficiency*/
var1 var2 var3 var4
a f k p
b g l q
c h m r
d i n s
e j o t
Want:
var_final index
a 1
b 1
c 1
d 1
e 1
f 2
g 2
h 2
i 2
j 2
k 3
l 3
m 3
n 3
o 3
A little late for the party. If your RAM is large enough to hold the whole target table, then Hash may meet your requirement of 'efficiency'. Please note, I have reshuffled the order in your input to make sure the sort is working ONLY on the Index:
data vars;
length var1-var4 $1;
input var1-var4;
cards;
c h m r
a f k p
e j o t
b g l q
d i n s
;;;;
data _null_;
if _n_=1 then
do;
declare hash h(multidata:'y', ordered:'y');
h.definekey('index');
h.definedata('_var', 'index');
h.definedone();
end;
set vars end=last;
array var var:;
do over var;
_var=var;
index=(_i_);
rc=h.add();
end;
if last then
rc=h.output(dataset:'want');
run;
Good Luck,
Haikuo
Hi,
Well, for a flexible way (i.e. unknown number of vars):
data have;
var1='a'; var2="f"; var3="k"; var4="p"; output;
var1="b"; var2="g"; var3="l"; var4="q"; output;
run;
proc sql noprint;
select NAME
into :V_LIST separated by " "
from SASHELP.VCOLUMN
where LIBNAME="WORK"
and MEMNAME="HAVE";
quit;
data want (keep=var_final index);
set have;
array var{*} &V_LIST.;
do i=1 to dim(var);
var_final=var{i};
index=i;
output;
end;
run;
proc sort data=want;
by index var_final;
run;
You only need one PROC TRANSPOSE and a data step view to create INDEX. You don't specify but I assume your VARs are all the same data type. With PROC TRANSPOSE the type could be mixed but the variables are all converted to character in that scenario. This conversion can be very useful, as the numeric's are converted using their associated formats (think VVALUE), I use it often and is something that data step array solutions will struggle with (at least require more code).
Hi John and Robert, Thank you for the responses. I ran both of your codes, however I am afraid neither of the results are appending the variables as wanted one below the other, rather it's appending the values of variables by the observation.
If you agree with me, your results are the same as what the below code produces
data vars;
length var1-var4 $1;
input var1-var4;
cards;
a f k p
b g l q
c h m r
d i n s
e j o t
;;;;
data want;
set vars;
do i=var1,var2,var3,var4;
index=_n_;
output;
end;
keep i index;
run;
:
Please accept my apologies if I didn't explain well enough while you guys are offering great help. Basically, I need to append columns one below the other as is i.e even the last sort step in RW9's code will not meet the requirement because the values need not be a,b, c, d based sequence or any other predictable sequence. In other words, its just an append, however the challenge is that there are too many variables. Hence the scalability of the code also matters big time.
Sincerely,
Charlotte
Hi,
Just don't do the final sort then, I only put that in to match the order given.
Well Sir, Then it goes back to square as your result will also join as DN's and mine:
a 1
f 2
k 3
p 4
b 1
g 2
l 3
q 4
which unfortunately doesn't meet the wanted output.
There are solutions above.
I am interested in the purpose of this though. If, as you say there is no logical arrangement of data other than a position in a dataset, how do you intend to process that further, or cover the fact that the position will most likely change? To my mind sorts are used so frequently purely to arrange the data in a logical manner. I can understand that if you have a requirement to have Cat F above Cat C which itself is above Cat B then the data gives no logical order, but in those instances I would be assigning sort variables to be used behind the scenes and not dsiplayed, i.e.
SEQ1 TEXT
1 CAT F
2 CAT C
3 CAT F
That way I can always get the sort I intend.
I see I did miss that detail. You could gen some code and use SET. I'm not sure the limit of the number of data sets that can be SET petty big so it should work with lots of variables. As written using rename the variables must be all the same type.
Charlotte,
The approach you listed (as well as the RW9 approach) should work fine, but you need to change the sorted order. Just remove one variable name from the final sort, and sort BY INDEX;
That should give you the proper order.
Good luck.
Alternative way :
data have; var1='a'; var2="f"; var3="k"; var4="p"; output; var1="b"; var2="g"; var3="l"; var4="q"; output; run; data _null_; set sashelp.vcolumn(keep=libname memname name where=(libname='WORK' and memname='HAVE')); n+1; call execute('data temp;set'); call execute(catt('have(keep=',name,' rename=(',name,'=var_final));index=',n,';run;proc append base=want data=temp force;run;')); run;
Xia Keshan
A little late for the party. If your RAM is large enough to hold the whole target table, then Hash may meet your requirement of 'efficiency'. Please note, I have reshuffled the order in your input to make sure the sort is working ONLY on the Index:
data vars;
length var1-var4 $1;
input var1-var4;
cards;
c h m r
a f k p
e j o t
b g l q
d i n s
;;;;
data _null_;
if _n_=1 then
do;
declare hash h(multidata:'y', ordered:'y');
h.definekey('index');
h.definedata('_var', 'index');
h.definedone();
end;
set vars end=last;
array var var:;
do over var;
_var=var;
index=(_i_);
rc=h.add();
end;
if last then
rc=h.output(dataset:'want');
run;
Good Luck,
Haikuo
You can declare the index for your implicit array so you don't have to assign INDEX=_I_;
Thanks, John. Learn some!
A True Class act!, Brilliant. Late to the party, Well, but first to open the champagne bottle. Yes , thanks to our healthy budget we have great RAM capacity. Thanks everyone, 1.genius and good friend- Xia 2. RW9 3. Robert Virgile-Astounding "I have bought a copy of your macro magic book" 4. John-Data _null_.
Thanks for making my evening a pleasant one indeed!!!.
Good night from Liverpool England,
Charlotte
I have not tested with multiple threads. A single thread for DS2 will be slightly less efficient than data step, but with a large enough problem the multi-thread should provide performance lift in this problem.
data foo;
drop _:;
array var[500];
do _i=1 to 1000000;
do _j=1 to dim(var);
var[_j]=_j;
end;
output;
end;
run;
proc ds2;
thread transpose / overwrite=yes;
declare double index _var;
vararray double var
keep index _var;
method run();
set x.foo;
do index=1 to dim(var);
_var=var[index];
output;
end;
end;
endthread;
run;
data bar(overwrite=yes);
dcl thread transpose tranp;
method run();
set from tranp threads=1;
end;
enddata;
run;
quit;
data bar2;
keep index _var;
set foo;
array var(index) var:;
do over var;
_var=var;
output;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.