Solved: Re: Unique Transpose Question?

MountainDew123 · Posted 08-17-2017 04:26 PM

I have a summarized data set where the first column has fewer unique values than the second column. I want to be able to transpose the dataset such that the unique values of Column A become a new variable and every corresponding value in column B goes under it. Here's an example:

Var1 Var 2

1 x

1 y

1 z

2 q

2 r

2 s

Becomes:

1 2

x q

y r

z s

Proc transpose doesn't seem to be the right solution to this, does anyone have an idea of how I can tackle this? Also the number of unique variables from Var1 are dynamic (doing this multiple times) so manually summarizing by every unqiue varaible in var one and then stacking them together isn't very efficient

Reeza · Posted 08-17-2017 06:00 PM

PROC TRANSPOSE but you first have to add a counter. It seems like the 'easiest' approach in terms of understanding the process.

data have;

input Var1 Var2 $ ;
cards;
1        x
1        y
1        z
2        q
2        r
2        s
;
run;

data have2;
set have;
by var1;
if first.var1 then count=1;
else count+1;
run;

proc sort data=have2;
by count;
run;

proc transpose data=have2 out=want prefix=want;
by count;
var var2;
id var1;
run;

View solution in original post

Shmuel · Posted 08-17-2017 04:36 PM

proc sort data=have; by var2; run;
proc transpose data=have out=want;
  by var2;
  var var1;
run;

MountainDew123 · Posted 08-21-2017 07:44 AM

Var 1 will be no more than 12 possible values with length between 3-30 and var 2 across all of var 1's will probably be no more than 25-30 with a consistent length of about 5.

Astounding · Posted 08-21-2017 08:16 AM

That's a relatively small number of values. I'm inclined to load them all into memory in the right order:

data halfway_there;

array allvals {12, 30} $ 32;

do until (done);

set have end=done;

by val1 notsorted;

if first.var1 then do;

col + 1;

row=1;

vals{row, col} = var1;

end;

row + 1;

vals{row, col} = var2;

end;

call symputx('nrows', row);

call symputx('ncols', col);

run;

That much gives you a two-dimensional array with all the values in the "right" places. In addition, the final value of ROW and COL tell you how much of the array is actually filled. So the DATA step captures how many rows and columns were actually needed. That's the CALL SYMPUTX part (which relies on the data being consistent ... same number of rows for every column).

With the number of required rows and columns known, a second DATA step could "unpack" the array:

data want;

set halfway_there;

length var1-var&ncols $ 32;

array var {&ncols};

keep var1-var&ncols;

array allvals {12, 30};

do row=1 to &nrows;

do col=1 to &ncols;

var{col} = allvals{row, col};

end;

output;

end;

run;

The code is untested, so might need to be tweaked. But it's definitely a viable approach.

Astounding · Posted 08-17-2017 04:58 PM

Clearly, VAR1 will have to be character, in order for this to happen.

Can you describe the data a bit:

Maximum number of VAR1 values you might find in your data
Similarly, maximum number of VAR2 values
Maximum size of a VAR2 value (number of characters)

I'm thinking it might be possible to load the entire data set into memory at once, but your answers will help determine that.

novinosrin · Posted 08-17-2017 05:38 PM

@MountainDew123 Hi, It's a very interesting question and I am little stumped to make it elegant as my brain isn't working the best perhaps because of too much of coffee. I will await for super elegant solution from a super user like you. However, here is my ugly yet working solution for the time being:-

data have;

input (Var1 Var2) ($);

datalines;

1 x

1 y

1 z

2 q

2 r

2 s

;

/* I split into multiple datasets in this step*/

data _null_;

declare hash myhash(multidata:'y');

rc = myhash.defineKey('var1');

rc = myhash.defineData('var1','need');

rc = myhash.defineDone( );

do until(last.var1);

set have;

by var1;

if first.var1 then

do;

myhash.clear();

need=var1;

myhash.add();

need=var2;

myhash.add();

end;

else do;

need=var2;

myhash.add();

end;

if last.var1 then myhash.output(dataset:cats('var',var1));

end;

run;

/*I combine the datasets obtained in the previous step to get your WANT*/

data want;

set var1(rename=(need=need1));

set var2(rename=(need=need2));

drop var1;

run;

Regards,

Naveen Srinivasan

Reeza · Posted 08-17-2017 06:00 PM

PROC TRANSPOSE but you first have to add a counter. It seems like the 'easiest' approach in terms of understanding the process.

data have;

input Var1 Var2 $ ;
cards;
1        x
1        y
1        z
2        q
2        r
2        s
;
run;

data have2;
set have;
by var1;
if first.var1 then count=1;
else count+1;
run;

proc sort data=have2;
by count;
run;

proc transpose data=have2 out=want prefix=want;
by count;
var var2;
id var1;
run;

MountainDew123 · Posted 08-21-2017 02:01 PM

Worked like a charm thanks a ton

art297 · Posted 08-17-2017 06:00 PM

There is probably a less convoluted way to do this (assuming I correctly understand what you want) but, if it is what you want, here is one way to do it:

data have;
  input Var1 Var2 $;
  cards;
1        x
1        y
1        z
2        q
2        r
2        s
;

proc sql noprint;
  select min(var1) into :minv
    from have
  ;
  
  select n(var1)
    into :obs
      from have
        where Var1 eq &minv.
  ;
quit;

data need;
  set have;
  _obs=ifn(mod(_n_,&obs.) gt 0,mod(_n_,&obs.),&obs.);
run;

proc sort data=need;
 by _obs;
run;

options validvarname=any;
proc transpose data=need out=want (drop=_:);
  by _obs;
  var var2;
  id var1;
run;

Art, CEO, AnalystFinder.com

PGStats · Posted 08-17-2017 11:31 PM

Assuming you want to pair values based on their order:

data have;
input Var1 Var2 $;
datalines;
1        x
1        y
1        z
2        q
2        r
2        s
;

data temp;
set have; by var1;
if first.var1 then order = 0;
order + 1;
run;

proc sort data=temp; by order var1; run;

proc transpose data=temp out=want(drop=_name_ order) prefix=_;
by order;
id var1;
idlabel var1;
var var2;
run;

proc print noobs label;
run;

PG

Ksharp · Posted 08-18-2017 09:49 AM

Double proc transpose can do that .


data have;
input Var1 Var2 $;
datalines;
1        x
1        y
1        z
2        q
2        r
2        s
;

proc transpose data=have out=temp;
by var1;
var var2;
run;
proc transpose data=temp out=want;
id var1;
var col:;
run;



OR Merge Skill.


proc sql;
select distinct catt('have(where=(var1=',var1,') rename=(var2=_',var1,'))')
 into : list separated by ' '
  from have;
quit;

data want2;
 merge &list;
 drop var1;
run;

art297 · Posted 08-18-2017 11:02 AM

I definitely like @Ksharp's double transpose solution more than the code I suggested and, if you really want the resulting variables to be named 1 and 2, you'd just have to insert:

options validvarname=any;

before the second proc transpose

Art, CEO, AnalystFinder.com

Tom · Posted 08-21-2017 08:15 AM

Why did you make the first observations have col1='x' and col2='q'? Do you want to just align the values by their order in the original dataset? If so you need a new variable to use proc transpose.

data middle ;
  set have ;
  by var1;
  row+1;
  if first.var1 then row=1;
run;
proc sort;
  by row var1 ;
run;
proc tranpose data=middle out=want ;
  by row ;
  id var1;
  var var2;
run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away