Unique Transpose Question?

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Unique Transpose Question?

[ Edited ]

I have a summarized data set where the first column has fewer unique values than the second column. I want to be able to transpose the dataset such that the unique values of Column A become a new variable and every corresponding value in column B goes under it. Here's an example:

 

Var1 Var 2

1        x

1        y

1        z

2        q

2        r

2        s

 

 

Becomes:

1  2

x  q

y  r

z  s

 

Proc transpose doesn't seem to be the right solution to this, does anyone have an idea of how I can tackle this? Also the number of unique variables from Var1 are dynamic (doing this multiple times) so manually summarizing by every unqiue varaible in var one and then stacking them together isn't very efficient 


Accepted Solutions
Solution
‎08-21-2017 02:00 PM
Super User
Posts: 21,546

Re: Unique Transpose Question?

Posted in reply to MountainDew123

PROC TRANSPOSE but you first have to add a counter. It seems like the 'easiest' approach in terms of understanding the process.

 

data have;

input Var1 Var2 $ ;
cards;
1        x
1        y
1        z
2        q
2        r
2        s
;
run;

data have2;
set have;
by var1;
if first.var1 then count=1;
else count+1;
run;

proc sort data=have2;
by count;
run;

proc transpose data=have2 out=want prefix=want;
by count;
var var2;
id var1;
run;


View solution in original post


All Replies
Trusted Advisor
Posts: 1,683

Re: Unique Transpose Question?

Posted in reply to MountainDew123

 

proc sort data=have; by var2; run;
proc transpose data=have out=want; by var2; var var1; run;

 

New Contributor
Posts: 3

Re: Unique Transpose Question?

Var 1 will be no more than 12 possible values with length between 3-30 and var 2 across all of var 1's will probably be no more than 25-30 with a consistent length of about 5.

Super User
Posts: 6,005

Re: Unique Transpose Question?

Posted in reply to MountainDew123

That's a relatively small number of values.  I'm inclined to load them all into memory in the right order:

 

data halfway_there;

array allvals {12, 30} $ 32;

do until (done);

   set have end=done;

   by val1 notsorted;

   if first.var1 then do;

      col + 1;

      row=1;

      vals{row, col} = var1;

   end;

   row + 1;

   vals{row, col} = var2;

end;

call symputx('nrows', row);

call symputx('ncols', col);

run;

 

That much gives you a two-dimensional array with all the values in the "right" places.  In addition, the final value of ROW and COL tell you how much of the array is actually filled.  So the DATA step captures how many rows and columns were actually needed.  That's the CALL SYMPUTX part (which relies on the data being consistent ... same number of rows for every column).

 

With the number of required rows and columns known, a second DATA step could "unpack" the array:

 

data want;

set halfway_there;

length var1-var&ncols $ 32;

array var {&ncols};

keep var1-var&ncols;

array allvals {12, 30};

do row=1 to &nrows;

   do col=1 to &ncols;

      var{col} = allvals{row, col};

   end;

   output;

end;

run;

 

The code is untested, so might need to be tweaked.  But it's definitely a viable approach.

Super User
Posts: 6,005

Re: Unique Transpose Question?

Posted in reply to MountainDew123

Clearly, VAR1 will have to be character, in order for this to happen.

 

Can you describe the data a bit:

 

  • Maximum number of VAR1 values you might find in your data
  • Similarly, maximum number of VAR2 values
  • Maximum size of a VAR2 value (number of characters)

I'm thinking it might be possible to load the entire data set into memory at once, but your answers will help determine that.

PROC Star
Posts: 831

Re: Unique Transpose Question?

Posted in reply to Astounding

@MountainDew123  Hi, It's a very interesting question and I am little stumped to make it elegant as my brain isn't working the best perhaps because of too much of coffee. I will await for super elegant solution from a super user like you. However, here is my ugly yet working solution for the time being:-

 

 

data have;

input (Var1 Var2) ($);

datalines;

1        x

1        y

1        z

2        q

2        r

2        s

;

 

/* I split into multiple datasets in this step*/

data _null_;

 declare hash myhash(multidata:'y');

      rc = myhash.defineKey('var1');

      rc = myhash.defineData('var1','need');

      rc = myhash.defineDone( );

do until(last.var1);

  set have;

  by var1;

  if first.var1 then

  do;

       myhash.clear();

      need=var1;

      myhash.add();

       need=var2;

        myhash.add();

  end;

  else do;

   need=var2;

  myhash.add();

  end;

  if last.var1 then  myhash.output(dataset:cats('var',var1));

end;

run;

 

/*I combine the datasets obtained in the previous step to get your WANT*/

data want;

set var1(rename=(need=need1));

set var2(rename=(need=need2));

drop var1;

run;

 

Regards,

Naveen Srinivasan

Solution
‎08-21-2017 02:00 PM
Super User
Posts: 21,546

Re: Unique Transpose Question?

Posted in reply to MountainDew123

PROC TRANSPOSE but you first have to add a counter. It seems like the 'easiest' approach in terms of understanding the process.

 

data have;

input Var1 Var2 $ ;
cards;
1        x
1        y
1        z
2        q
2        r
2        s
;
run;

data have2;
set have;
by var1;
if first.var1 then count=1;
else count+1;
run;

proc sort data=have2;
by count;
run;

proc transpose data=have2 out=want prefix=want;
by count;
var var2;
id var1;
run;


New Contributor
Posts: 3

Re: Unique Transpose Question?

Worked like a charm thanks a ton

PROC Star
Posts: 7,800

Re: Unique Transpose Question?

Posted in reply to MountainDew123

There is probably a less convoluted way to do this (assuming I correctly understand what you want) but, if it is what you want, here is one way to do it:

data have;
  input Var1 Var2 $;
  cards;
1        x
1        y
1        z
2        q
2        r
2        s
;

proc sql noprint;
  select min(var1) into :minv
    from have
  ;
  
  select n(var1)
    into :obs
      from have
        where Var1 eq &minv.
  ;
quit;

data need;
  set have;
  _obs=ifn(mod(_n_,&obs.) gt 0,mod(_n_,&obs.),&obs.);
run;

proc sort data=need;
 by _obs;
run;

options validvarname=any;
proc transpose data=need out=want (drop=_:);
  by _obs;
  var var2;
  id var1;
run;

Art, CEO, AnalystFinder.com

 

Esteemed Advisor
Posts: 5,129

Re: Unique Transpose Question?

Posted in reply to MountainDew123

Assuming you want to pair values based on their order:

 

data have;
input Var1 Var2 $;
datalines;
1        x
1        y
1        z
2        q
2        r
2        s
;

data temp;
set have; by var1;
if first.var1 then order = 0;
order + 1;
run;

proc sort data=temp; by order var1; run;

proc transpose data=temp out=want(drop=_name_ order) prefix=_;
by order;
id var1;
idlabel var1;
var var2;
run;

proc print noobs label;
run;
PG
Super User
Posts: 10,323

Re: Unique Transpose Question?

Posted in reply to MountainDew123
Double proc transpose can do that .


data have;
input Var1 Var2 $;
datalines;
1        x
1        y
1        z
2        q
2        r
2        s
;

proc transpose data=have out=temp;
by var1;
var var2;
run;
proc transpose data=temp out=want;
id var1;
var col:;
run;



OR Merge Skill.


proc sql;
select distinct catt('have(where=(var1=',var1,') rename=(var2=_',var1,'))')
 into : list separated by ' '
  from have;
quit;

data want2;
 merge &list;
 drop var1;
run;

PROC Star
Posts: 7,800

Re: Unique Transpose Question?

I definitely like @Ksharp's double transpose solution more than the code I suggested and, if you really want the resulting variables to be named 1 and 2, you'd just have to insert:

options validvarname=any;

before the second proc transpose

 

Art, CEO, AnalystFinder.com

 

Super User
Super User
Posts: 7,399

Re: Unique Transpose Question?

[ Edited ]
Posted in reply to MountainDew123

Why did you make the first observations have col1='x' and col2='q'?  Do you want to just align the values by their order in the original dataset?  If so you need a new variable to use proc transpose.

data middle ;
  set have ;
  by var1;
  row+1;
  if first.var1 then row=1;
run;
proc sort;
  by row var1 ;
run;
proc tranpose data=middle out=want ;
  by row ;
  id var1;
  var var2;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 331 views
  • 2 likes
  • 9 in conversation