BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MountainDew123
Calcite | Level 5

I have a summarized data set where the first column has fewer unique values than the second column. I want to be able to transpose the dataset such that the unique values of Column A become a new variable and every corresponding value in column B goes under it. Here's an example:

 

Var1 Var 2

1        x

1        y

1        z

2        q

2        r

2        s

 

 

Becomes:

1  2

x  q

y  r

z  s

 

Proc transpose doesn't seem to be the right solution to this, does anyone have an idea of how I can tackle this? Also the number of unique variables from Var1 are dynamic (doing this multiple times) so manually summarizing by every unqiue varaible in var one and then stacking them together isn't very efficient 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

PROC TRANSPOSE but you first have to add a counter. It seems like the 'easiest' approach in terms of understanding the process.

 

data have;

input Var1 Var2 $ ;
cards;
1        x
1        y
1        z
2        q
2        r
2        s
;
run;

data have2;
set have;
by var1;
if first.var1 then count=1;
else count+1;
run;

proc sort data=have2;
by count;
run;

proc transpose data=have2 out=want prefix=want;
by count;
var var2;
id var1;
run;


View solution in original post

12 REPLIES 12
Shmuel
Garnet | Level 18

 

proc sort data=have; by var2; run;
proc transpose data=have out=want; by var2; var var1; run;

 

MountainDew123
Calcite | Level 5

Var 1 will be no more than 12 possible values with length between 3-30 and var 2 across all of var 1's will probably be no more than 25-30 with a consistent length of about 5.

Astounding
PROC Star

That's a relatively small number of values.  I'm inclined to load them all into memory in the right order:

 

data halfway_there;

array allvals {12, 30} $ 32;

do until (done);

   set have end=done;

   by val1 notsorted;

   if first.var1 then do;

      col + 1;

      row=1;

      vals{row, col} = var1;

   end;

   row + 1;

   vals{row, col} = var2;

end;

call symputx('nrows', row);

call symputx('ncols', col);

run;

 

That much gives you a two-dimensional array with all the values in the "right" places.  In addition, the final value of ROW and COL tell you how much of the array is actually filled.  So the DATA step captures how many rows and columns were actually needed.  That's the CALL SYMPUTX part (which relies on the data being consistent ... same number of rows for every column).

 

With the number of required rows and columns known, a second DATA step could "unpack" the array:

 

data want;

set halfway_there;

length var1-var&ncols $ 32;

array var {&ncols};

keep var1-var&ncols;

array allvals {12, 30};

do row=1 to &nrows;

   do col=1 to &ncols;

      var{col} = allvals{row, col};

   end;

   output;

end;

run;

 

The code is untested, so might need to be tweaked.  But it's definitely a viable approach.

Astounding
PROC Star

Clearly, VAR1 will have to be character, in order for this to happen.

 

Can you describe the data a bit:

 

  • Maximum number of VAR1 values you might find in your data
  • Similarly, maximum number of VAR2 values
  • Maximum size of a VAR2 value (number of characters)

I'm thinking it might be possible to load the entire data set into memory at once, but your answers will help determine that.

novinosrin
Tourmaline | Level 20

@MountainDew123  Hi, It's a very interesting question and I am little stumped to make it elegant as my brain isn't working the best perhaps because of too much of coffee. I will await for super elegant solution from a super user like you. However, here is my ugly yet working solution for the time being:-

 

 

data have;

input (Var1 Var2) ($);

datalines;

1        x

1        y

1        z

2        q

2        r

2        s

;

 

/* I split into multiple datasets in this step*/

data _null_;

 declare hash myhash(multidata:'y');

      rc = myhash.defineKey('var1');

      rc = myhash.defineData('var1','need');

      rc = myhash.defineDone( );

do until(last.var1);

  set have;

  by var1;

  if first.var1 then

  do;

       myhash.clear();

      need=var1;

      myhash.add();

       need=var2;

        myhash.add();

  end;

  else do;

   need=var2;

  myhash.add();

  end;

  if last.var1 then  myhash.output(dataset:cats('var',var1));

end;

run;

 

/*I combine the datasets obtained in the previous step to get your WANT*/

data want;

set var1(rename=(need=need1));

set var2(rename=(need=need2));

drop var1;

run;

 

Regards,

Naveen Srinivasan

Reeza
Super User

PROC TRANSPOSE but you first have to add a counter. It seems like the 'easiest' approach in terms of understanding the process.

 

data have;

input Var1 Var2 $ ;
cards;
1        x
1        y
1        z
2        q
2        r
2        s
;
run;

data have2;
set have;
by var1;
if first.var1 then count=1;
else count+1;
run;

proc sort data=have2;
by count;
run;

proc transpose data=have2 out=want prefix=want;
by count;
var var2;
id var1;
run;


MountainDew123
Calcite | Level 5

Worked like a charm thanks a ton

art297
Opal | Level 21

There is probably a less convoluted way to do this (assuming I correctly understand what you want) but, if it is what you want, here is one way to do it:

data have;
  input Var1 Var2 $;
  cards;
1        x
1        y
1        z
2        q
2        r
2        s
;

proc sql noprint;
  select min(var1) into :minv
    from have
  ;
  
  select n(var1)
    into :obs
      from have
        where Var1 eq &minv.
  ;
quit;

data need;
  set have;
  _obs=ifn(mod(_n_,&obs.) gt 0,mod(_n_,&obs.),&obs.);
run;

proc sort data=need;
 by _obs;
run;

options validvarname=any;
proc transpose data=need out=want (drop=_:);
  by _obs;
  var var2;
  id var1;
run;

Art, CEO, AnalystFinder.com

 

PGStats
Opal | Level 21

Assuming you want to pair values based on their order:

 

data have;
input Var1 Var2 $;
datalines;
1        x
1        y
1        z
2        q
2        r
2        s
;

data temp;
set have; by var1;
if first.var1 then order = 0;
order + 1;
run;

proc sort data=temp; by order var1; run;

proc transpose data=temp out=want(drop=_name_ order) prefix=_;
by order;
id var1;
idlabel var1;
var var2;
run;

proc print noobs label;
run;
PG
Ksharp
Super User
Double proc transpose can do that .


data have;
input Var1 Var2 $;
datalines;
1        x
1        y
1        z
2        q
2        r
2        s
;

proc transpose data=have out=temp;
by var1;
var var2;
run;
proc transpose data=temp out=want;
id var1;
var col:;
run;



OR Merge Skill.


proc sql;
select distinct catt('have(where=(var1=',var1,') rename=(var2=_',var1,'))')
 into : list separated by ' '
  from have;
quit;

data want2;
 merge &list;
 drop var1;
run;

art297
Opal | Level 21

I definitely like @Ksharp's double transpose solution more than the code I suggested and, if you really want the resulting variables to be named 1 and 2, you'd just have to insert:

options validvarname=any;

before the second proc transpose

 

Art, CEO, AnalystFinder.com

 

Tom
Super User Tom
Super User

Why did you make the first observations have col1='x' and col2='q'?  Do you want to just align the values by their order in the original dataset?  If so you need a new variable to use proc transpose.

data middle ;
  set have ;
  by var1;
  row+1;
  if first.var1 then row=1;
run;
proc sort;
  by row var1 ;
run;
proc tranpose data=middle out=want ;
  by row ;
  id var1;
  var var2;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1236 views
  • 2 likes
  • 9 in conversation