Solved: Re: How to append SAS Variables "efficiently"?

CharlotteCain · Posted 03-13-2015 06:39 AM

How to append SAS Variables in the sample dataset below: declaring array, transposing and transposing back to one seems to work, but seems tedious and inefficient..Any efficent method plz?

Have:/*what if there are too many variables? hence need efficiency*/

var1 var2 var3 var4

a f k p

b g l q

c h m r

d i n s

e j o t

Want:

var_final index

a 1

b 1

c 1

d 1

e 1

f 2

g 2

h 2

i 2

j 2

k 3

l 3

m 3

n 3

o 3

Haikuo · Posted 03-13-2015 10:14 AM

A little late for the party. If your RAM is large enough to hold the whole target table, then Hash may meet your requirement of 'efficiency'. Please note, I have reshuffled the order in your input to make sure the sort is working ONLY on the Index:

data vars;

length var1-var4 $1;

input var1-var4;

cards;

c h m r

a f k p

e j o t

b g l q

d i n s

;;;;

data _null_;

if _n_=1 then

do;

declare hash h(multidata:'y', ordered:'y');

h.definekey('index');

h.definedata('_var', 'index');

h.definedone();

end;

set vars end=last;

array var var:;

do over var;

_var=var;

index=(_i_);

rc=h.add();

end;

if last then

rc=h.output(dataset:'want');

run;

Good Luck,

Haikuo

View solution in original post

RW9 · Posted 03-13-2015 07:01 AM

Hi,

Well, for a flexible way (i.e. unknown number of vars):

data have;

var1='a'; var2="f"; var3="k"; var4="p"; output;

var1="b"; var2="g"; var3="l"; var4="q"; output;

run;

proc sql noprint;

select NAME

into :V_LIST separated by " "

from SASHELP.VCOLUMN

where LIBNAME="WORK"

and MEMNAME="HAVE";

quit;

data want (keep=var_final index);

set have;

array var{*} &V_LIST.;

do i=1 to dim(var);

var_final=var{i};

index=i;

output;

end;

run;

proc sort data=want;

by index var_final;

run;

data_null__ · Posted 03-13-2015 07:33 AM

You only need one PROC TRANSPOSE and a data step view to create INDEX. You don't specify but I assume your VARs are all the same data type. With PROC TRANSPOSE the type could be mixed but the variables are all converted to character in that scenario. This conversion can be very useful, as the numeric's are converted using their associated formats (think VVALUE), I use it often and is something that data step array solutions will struggle with (at least require more code).

data vars;
   length var1-var4 $1; 
   input var1-var4;
   cards; 
a f k p
b g l q
c h m r
d i n s
e j o t
;;;;
   run; 
proc print; 
   run; 
data varsv / view=varsv;
   set vars;
   index = _n_;
   run; 
proc transpose data=varsv out=vars2(drop=_name_  /*maybe not drop*/ rename=(col1=var_final));
   by index;
   var _character_; /*or other SAS Variable List */
   run; 
proc print; 
   run;

CharlotteCain · Posted 03-13-2015 08:49 AM

Hi John and Robert, Thank you for the responses. I ran both of your codes, however I am afraid neither of the results are appending the variables as wanted one below the other, rather it's appending the values of variables by the observation.

If you agree with me, your results are the same as what the below code produces

data vars;

length var1-var4 $1;

input var1-var4;

cards;

a f k p

b g l q

c h m r

d i n s

e j o t

;;;;

data want;

set vars;

do i=var1,var2,var3,var4;

index=_n_;

output;

end;

keep i index;

run;

:

Please accept my apologies if I didn't explain well enough while you guys are offering great help. Basically, I need to append columns one below the other as is i.e even the last sort step in RW9's code will not meet the requirement because the values need not be a,b, c, d based sequence or any other predictable sequence. In other words, its just an append, however the challenge is that there are too many variables. Hence the scalability of the code also matters big time.

Sincerely,

Charlotte

RW9 · Posted 03-13-2015 08:55 AM

Hi,

Just don't do the final sort then, I only put that in to match the order given.

CharlotteCain · Posted 03-13-2015 09:07 AM

Well Sir, Then it goes back to square as your result will also join as DN's and mine:

a 1

f 2

k 3

p 4

b 1

g 2

l 3

q 4

which unfortunately doesn't meet the wanted output.

RW9 · Posted 03-13-2015 09:33 AM

There are solutions above.

I am interested in the purpose of this though. If, as you say there is no logical arrangement of data other than a position in a dataset, how do you intend to process that further, or cover the fact that the position will most likely change? To my mind sorts are used so frequently purely to arrange the data in a logical manner. I can understand that if you have a requirement to have Cat F above Cat C which itself is above Cat B then the data gives no logical order, but in those instances I would be assigning sort variables to be used behind the scenes and not dsiplayed, i.e.

SEQ1 TEXT

1 CAT F

2 CAT C

3 CAT F

That way I can always get the sort I intend.

data_null__ · Posted 03-13-2015 09:21 AM

I see I did miss that detail. You could gen some code and use SET. I'm not sure the limit of the number of data sets that can be SET petty big so it should work with lots of variables. As written using rename the variables must be all the same type.

data vars;
   length var1-var4 $1; 
   input var1-var4;
   cards; 
a f k p
b g l q
c h m r
d i n s
e j o t
;;;;
   run; 
proc print; 
   run; 
proc transpose data=vars(obs=0) out=names;
   var var:;
   run; 
filename FT61F001 temp; 
data _null_; 
   file FT61F001;
   set names end=eof;
   if _n_ eq 1 then put 'Set'; 
   length in $8; 
   in = cats('_in',_n_);
   put +3 'vars(' in= 'keep=' _name_ 'Rename=(' _name_ +(-1) '=final))'; 
   if eof then put +3 ';'; 
   run; 

data appended;
   %include FT61F001 / source2;
   index = findc(cats(of _in:),'1');
   run; 
proc print; 
   run;

Astounding · Posted 03-13-2015 09:39 AM

Charlotte,

The approach you listed (as well as the RW9 approach) should work fine, but you need to change the sorted order. Just remove one variable name from the final sort, and sort BY INDEX;

That should give you the proper order.

Good luck.

Ksharp · Posted 03-13-2015 09:02 AM

Alternative way :

data have; 
  var1='a'; var2="f"; var3="k"; var4="p"; output;
  var1="b"; var2="g"; var3="l"; var4="q"; output;
run;
data _null_;
 set sashelp.vcolumn(keep=libname memname name where=(libname='WORK' and memname='HAVE'));
 n+1;
 call execute('data temp;set');
 call execute(catt('have(keep=',name,' rename=(',name,'=var_final));index=',n,';run;proc append base=want data=temp force;run;'));
run;

Xia Keshan

Haikuo · Posted 03-13-2015 10:14 AM

A little late for the party. If your RAM is large enough to hold the whole target table, then Hash may meet your requirement of 'efficiency'. Please note, I have reshuffled the order in your input to make sure the sort is working ONLY on the Index:

data vars;

length var1-var4 $1;

input var1-var4;

cards;

c h m r

a f k p

e j o t

b g l q

d i n s

;;;;

data _null_;

if _n_=1 then

do;

declare hash h(multidata:'y', ordered:'y');

h.definekey('index');

h.definedata('_var', 'index');

h.definedone();

end;

set vars end=last;

array var var:;

do over var;

_var=var;

index=(_i_);

rc=h.add();

end;

if last then

rc=h.output(dataset:'want');

run;

Good Luck,

Haikuo

data_null__ · Posted 03-13-2015 10:41 AM

You can declare the index for your implicit array so you don't have to assign INDEX=_I_;

array var(index) var:;

DO OVER VAR; will increment INDEX instead of _I_.

Haikuo · Posted 03-13-2015 10:47 AM

Thanks, John. Learn some!

CharlotteCain · Posted 03-13-2015 01:24 PM

A True Class act!, Brilliant. Late to the party, Well, but first to open the champagne bottle. Yes , thanks to our healthy budget we have great RAM capacity. Thanks everyone, 1.genius and good friend- Xia 2. RW9 3. Robert Virgile-Astounding "I have bought a copy of your macro magic book" 4. John-Data _null_.

Thanks for making my evening a pleasant one indeed!!!.

Good night from Liverpool England,

Charlotte

FriedEgg · Posted 03-13-2015 04:43 PM

I have not tested with multiple threads. A single thread for DS2 will be slightly less efficient than data step, but with a large enough problem the multi-thread should provide performance lift in this problem.

data foo;

drop _:;

array var[500];

do _i=1 to 1000000;

do _j=1 to dim(var);

var[_j]=_j;

end;

output;

end;

run;

proc ds2;

thread transpose / overwrite=yes;

declare double index _var;

vararray double var

var1-var500;

keep index _var;

method run();

set x.foo;

do index=1 to dim(var);

_var=var[index];

output;

end;

endthread;

run;

data bar(overwrite=yes);

dcl thread transpose tranp;

method run();

set from tranp threads=1;

end;

enddata;

run;

quit;

data bar2;

keep index _var;

set foo;

array var(index) var:;

do over var;

_var=var;

output;

end;

run;

Registration is open

SAS Training: Just a Click Away