Hello
What is the way to create data set want from data set have via proc transpose(or other way)?
data have; input tbl $ field $; cards; tbl1 x tbl1 y tbl1 z tbl2 w tbl2 x tbl2 v tbl2 y ; run; data want; input tbl1 $ tbl2 $ ; cards; x w y x z v . y ; Run;
data have;
input tbl $ field $;
cards;
tbl1 x
tbl1 y
tbl1 z
tbl2 w
tbl2 x
tbl2 v
tbl2 y
;
run;
proc sort data=have;
by tbl;
run;
proc freq data=have order=data noprint;
by tbl;
table field/out=temp;
run;
proc transpose data=temp out=temp2;
by tbl;
var field;
run;
proc transpose data=temp2 out=want(drop=_:);
var col:;
id tbl;
run;
Are your real variable values for tbl sequentially numbered with common name? If not do you want the actual text of the "tbl" to be the name of the resulting variable? And if that is the case have you verified than all of the values of "tbl" are valid variable names prior to this step?
If so this may be closer to what you want: (Basically the same as @Ksharp but without a step that forces the numeric naming).
This assumes that the values of "field" are grouped by the actual "tbl", as in the likely output from requesting the names of all the tables and variables from a library from the dictionary tables. Note that depending on the setting of VALIDMEMNAME and VALIDVARNAME when that library was created and this code is submitted you might have table names that are not acceptable as variable names.
proc transpose data=have out=trans1 ; by tbl notsorted; var field; run; proc transpose data=trans1 out=want (drop=_name_) ; by _name_; var col: ; id tbl; run;
I would skip the transpose steps and just do my own code generation.
filename code temp;
data _null_;
set have end=eof;
by tbl;
file code;
if _n_=1 then put 'merge';
if first.tbl then put
' have(rename=(field=' tbl ') where=(' tbl =:$quote. '))'
;
if eof then put ';' ;
run;
data want;
%include code / source2;
drop tbl ;
run;
Here is the step that is generated for this example:
13 data want; 14 %include code / source2; NOTE: %INCLUDE (level 1) file CODE is (system-specific pathname). 15 +merge 16 + have(rename=(field=tbl1 ) where=(tbl="tbl1" )) 17 + have(rename=(field=tbl2 ) where=(tbl="tbl2" )) 18 +; NOTE: %INCLUDE (level 1) ending. 19 drop tbl ; 20 run; NOTE: The data set WORK.WANT has 4 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.00 seconds
If you wanted to use PROC TRANSPOSE you would need to add the missing row identifier that uniquely identifies how to pair the values for field for TBL1 with the values of field for TBL2. And then reorder the data by this row identifier.
data step1;
do row=1 by 1 until(last.tbl);
set have;
by tbl;
output;
end;
run;
proc sort data=step1;
by row tbl ;
run;
Result
OBS row tbl field 1 1 tbl1 x 2 1 tbl2 w 3 2 tbl1 y 4 2 tbl2 x 5 3 tbl1 z 6 3 tbl2 v 7 4 tbl2 y
Now you can transpose it.
proc transpose data=step1 out=want(drop=_name_);
by row;
id tbl;
var field;
run;
To get
OBS row tbl1 tbl2 1 1 x w 2 2 y x 3 3 z v 4 4 y
Tom,
No need to make a SEQ variable, proc transpose would take care of it.
data have;
input tbl $ field $;
cards;
tbl1 x
tbl1 y
tbl1 z
tbl2 w
tbl2 x
tbl2 v
tbl2 y
;
run;
proc transpose data=have out=temp2;
by tbl;
var field;
run;
proc transpose data=temp2 out=want(drop=_:);
var col:;
id tbl;
run;
Does it work when there are more than 10?
Does the fact that the COL variables will be named COL1 ... COL9 COL10 COL11 (and so will sort into COL1 COL10 COL11 COL2 COL3 ... ) matter?
Tom,
I don't understand what you mean.
COLn has nothing to do with sorting 'COL1', 'COL2', ......
Here my last proc transpose is unlike ballardw 's code . I did not use "BY _NAME_;".
Check this:
data have; input tbl $ field $; cards; tbl1 c tbl1 b tbl1 b tbl1 a tbl1 a tbl1 a tbl2 w tbl2 x tbl2 d tbl2 a tbl2 w tbl2 x tbl2 d tbl2 a tbl2 w tbl2 x tbl2 d tbl2 a tbl2 w tbl2 x tbl2 d tbl2 a ; run; proc transpose data=have out=temp2; by tbl; var field; run; proc transpose data=temp2 out=want(drop=_:); var col:; id tbl; run;
This is for the data you posted:
data want;
length tbl1 tbl2 $8;
call missing(of tbl:);
merge
have (where=(tbl="tbl1") rename=(field=tbl1))
have (where=(tbl="tbl2") rename=(field=tbl2))
;
drop tbl;
run;
For more values of tbl, run a preliminary step to get the list of values, and use macro %DO loops to create the code dynamically:
%macro want;
%local tables tbl i;
proc sql noprint;
select distinct tbl into :tables separated by " "
from have;
quit;
data want;
length
&tables
$8
;
call missing(of tbl:);
merge
%do i = 1 %to %sysfunc(countw(&tables));
%let tbl = %scan(&tables,&i);
have (where=(tbl="&tbl.") rename=(field=&tbl.))
%end;
;
drop tbl;
run;
%mend;
%want
If you already know the fixed values of variable TBL, then a merge of subsets of HAVE (one subset per TBL value), each with a rename, makes this into a single pass DATA step:
data have;
input tbl $ field $;
cards;
tbl1 x
tbl1 y
tbl1 z
tbl2 w
tbl2 x
tbl2 v
tbl2 y
run;
data want;
merge have (where=(tbl='tbl1') rename=(field=tlb1))
have (where=(tbl='tbl2') rename=(field=tlb2)) ;
drop tbl;
run;
@LinusH wrote:
I can't see that you can do this without losing data consistency. It seems now that you put values on the same row based on the input data order.
What is the meaning of each row in the new table?
Good point. It is hard to make sense of the new table, without additional explanation.
Here is another possible way to do this:
data have;
input tbl $ field $;
cards;
tbl1 x
tbl1 y
tbl1 z
tbl2 w
tbl2 x
tbl2 v
tbl2 y
;
run;
data have;
set have;
by tbl;
if first.tbl then flag=0;
flag+1;
run;
proc sort data=have;
by flag tbl;
run;
proc transpose data=have out=want(drop=flag _name_);
by flag;
id tbl;
var field;
run;
proc print data=want;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.