Long to wide

Ronein · Posted 05-18-2025 12:44 AM

Hello

What is the way to create data set want from data set have via proc transpose(or other way)?

data have;
input tbl $ field $;
cards;
tbl1 x
tbl1 y
tbl1 z
tbl2 w
tbl2 x
tbl2 v
tbl2 y
;
run;

data want;
input tbl1 $ tbl2 $ ;
cards;
x w
y x
z v
. y
;
Run;

Ksharp · Posted 05-18-2025 02:03 AM

data have;
input tbl $ field $;
cards;
tbl1 x
tbl1 y
tbl1 z
tbl2 w
tbl2 x
tbl2 v
tbl2 y
;
run;
proc sort data=have;
by tbl;
run;
proc freq data=have order=data noprint;
by tbl;
table field/out=temp;
run;
proc transpose data=temp out=temp2;
by tbl;
var field;
run;
proc transpose data=temp2 out=want(drop=_:);
var col:;
id tbl;
run;

ballardw · Posted 05-18-2025 02:51 PM

Are your real variable values for tbl sequentially numbered with common name? If not do you want the actual text of the "tbl" to be the name of the resulting variable? And if that is the case have you verified than all of the values of "tbl" are valid variable names prior to this step?

If so this may be closer to what you want: (Basically the same as @Ksharp but without a step that forces the numeric naming).

This assumes that the values of "field" are grouped by the actual "tbl", as in the likely output from requesting the names of all the tables and variables from a library from the dictionary tables. Note that depending on the setting of VALIDMEMNAME and VALIDVARNAME when that library was created and this code is submitted you might have table names that are not acceptable as variable names.

proc transpose data=have out=trans1
   ;
  by tbl notsorted;
  var field;
run;

proc transpose data=trans1 out=want (drop=_name_)
     ;
  by _name_;
  var col: ;
  id tbl;
run;

Tom · Posted 05-18-2025 03:47 PM

I would skip the transpose steps and just do my own code generation.

filename code temp;
data _null_;
  set have end=eof;
  by tbl;
  file code;
  if _n_=1 then put 'merge';
  if first.tbl then put
   '  have(rename=(field=' tbl ') where=(' tbl =:$quote. '))'
  ;
  if eof then put ';' ;
run;

data want;
%include code / source2;
  drop tbl ;
run;

Here is the step that is generated for this example:

13   data want;
14   %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is (system-specific pathname).
15  +merge
16  +  have(rename=(field=tbl1 ) where=(tbl="tbl1" ))
17  +  have(rename=(field=tbl2 ) where=(tbl="tbl2" ))
18  +;
NOTE: %INCLUDE (level 1) ending.
19     drop tbl ;
20   run;

NOTE: The data set WORK.WANT has 4 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds

Tom · Posted 05-18-2025 03:59 PM

If you wanted to use PROC TRANSPOSE you would need to add the missing row identifier that uniquely identifies how to pair the values for field for TBL1 with the values of field for TBL2. And then reorder the data by this row identifier.

data step1;
  do row=1 by 1 until(last.tbl);
    set have;
    by tbl;
    output;
  end;
run;

proc sort data=step1;
  by row tbl ;
run;

Result

OBS    row    tbl     field

 1      1     tbl1      x
 2      1     tbl2      w
 3      2     tbl1      y
 4      2     tbl2      x
 5      3     tbl1      z
 6      3     tbl2      v
 7      4     tbl2      y

Now you can transpose it.

proc transpose data=step1 out=want(drop=_name_);
  by row;
  id tbl;
  var field;
run;

To get

OBS    row    tbl1    tbl2

 1      1      x       w
 2      2      y       x
 3      3      z       v
 4      4              y

Ksharp · Posted 05-18-2025 09:20 PM

Tom,

No need to make a SEQ variable, proc transpose would take care of it.

data have;
input tbl $ field $;
cards;
tbl1 x
tbl1 y
tbl1 z
tbl2 w
tbl2 x
tbl2 v
tbl2 y
;
run;

proc transpose data=have out=temp2;
by tbl;
var field;
run;
proc transpose data=temp2 out=want(drop=_:);
var col:;
id tbl;
run;

Tom · Posted 05-18-2025 09:24 PM

Does it work when there are more than 10?

Does the fact that the COL variables will be named COL1 ... COL9 COL10 COL11 (and so will sort into COL1 COL10 COL11 COL2 COL3 ... ) matter?

Ksharp · Posted 05-18-2025 09:35 PM

Tom,

I don't understand what you mean.

COLn has nothing to do with sorting 'COL1', 'COL2', ......

Here my last proc transpose is unlike ballardw 's code . I did not use "BY _NAME_;".

Check this:

data have;
input tbl $ field $;
cards;
tbl1 c
tbl1 b
tbl1 b
tbl1 a
tbl1 a
tbl1 a
tbl2 w
tbl2 x
tbl2 d
tbl2 a
tbl2 w
tbl2 x
tbl2 d
tbl2 a
tbl2 w
tbl2 x
tbl2 d
tbl2 a
tbl2 w
tbl2 x
tbl2 d
tbl2 a
;
run;

proc transpose data=have out=temp2;
by tbl;
var field;
run;
proc transpose data=temp2 out=want(drop=_:);
var col:;
id tbl;
run;

Kurt_Bremser · Posted 05-19-2025 04:50 AM

This is for the data you posted:

data want;
length tbl1 tbl2 $8;
call missing(of tbl:);
merge
  have (where=(tbl="tbl1") rename=(field=tbl1))
  have (where=(tbl="tbl2") rename=(field=tbl2))
;
drop tbl;
run;

For more values of tbl, run a preliminary step to get the list of values, and use macro %DO loops to create the code dynamically:

%macro want;
%local tables tbl i;

proc sql noprint;
select distinct tbl into :tables separated by " "
from have;
quit;
data want;
length
&tables
$8
;
call missing(of tbl:);
merge
%do i = 1 %to %sysfunc(countw(&tables));
  %let tbl = %scan(&tables,&i);
  have (where=(tbl="&tbl.") rename=(field=&tbl.))
%end;
;
drop tbl;
run;

%mend;

%want

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 05-20-2025 08:52 PM

If you already know the fixed values of variable TBL, then a merge of subsets of HAVE (one subset per TBL value), each with a rename, makes this into a single pass DATA step:

data have;
input tbl $ field $;
cards;
tbl1 x
tbl1 y
tbl1 z
tbl2 w
tbl2 x
tbl2 v
tbl2 y
run;

data want;
  merge have (where=(tbl='tbl1')  rename=(field=tlb1))
        have (where=(tbl='tbl2')  rename=(field=tlb2)) ;
  drop tbl;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

LinusH · Posted 05-21-2025 03:57 AM

I can't see that you can do this without losing data consistency. It seems now that you put values on the same row based on the input data order.
What is the meaning of each row in the new table?

Data never sleeps

PaigeMiller · Posted 05-21-2025 07:30 AM

@LinusH wrote:
I can't see that you can do this without losing data consistency. It seems now that you put values on the same row based on the input data order.
What is the meaning of each row in the new table?

Good point. It is hard to make sense of the new table, without additional explanation.

--
Paige Miller

Kathryn_SAS · Posted 05-21-2025 07:59 AM

Here is another possible way to do this:

data have;
input tbl $ field $;
cards;
tbl1 x
tbl1 y
tbl1 z
tbl2 w
tbl2 x
tbl2 v
tbl2 y
;
run;

data have;
set have;
by tbl;
if first.tbl then flag=0;
flag+1;
run;

proc sort data=have;
by flag tbl;
run;

proc transpose data=have out=want(drop=flag _name_);
by flag;
id tbl;
var field;
run;

proc print data=want;
run;

Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Re: Long to wide

Registration is open

SAS Training: Just a Click Away