I have a summarized data set where the first column has fewer unique values than the second column. I want to be able to transpose the dataset such that the unique values of Column A become a new variable and every corresponding value in column B goes under it. Here's an example:
Var1 Var 2
1 x
1 y
1 z
2 q
2 r
2 s
Becomes:
1 2
x q
y r
z s
Proc transpose doesn't seem to be the right solution to this, does anyone have an idea of how I can tackle this? Also the number of unique variables from Var1 are dynamic (doing this multiple times) so manually summarizing by every unqiue varaible in var one and then stacking them together isn't very efficient
PROC TRANSPOSE but you first have to add a counter. It seems like the 'easiest' approach in terms of understanding the process.
data have;
input Var1 Var2 $ ;
cards;
1 x
1 y
1 z
2 q
2 r
2 s
;
run;
data have2;
set have;
by var1;
if first.var1 then count=1;
else count+1;
run;
proc sort data=have2;
by count;
run;
proc transpose data=have2 out=want prefix=want;
by count;
var var2;
id var1;
run;
proc sort data=have; by var2; run;
proc transpose data=have out=want;
by var2;
var var1;
run;
Var 1 will be no more than 12 possible values with length between 3-30 and var 2 across all of var 1's will probably be no more than 25-30 with a consistent length of about 5.
That's a relatively small number of values. I'm inclined to load them all into memory in the right order:
data halfway_there;
array allvals {12, 30} $ 32;
do until (done);
set have end=done;
by val1 notsorted;
if first.var1 then do;
col + 1;
row=1;
vals{row, col} = var1;
end;
row + 1;
vals{row, col} = var2;
end;
call symputx('nrows', row);
call symputx('ncols', col);
run;
That much gives you a two-dimensional array with all the values in the "right" places. In addition, the final value of ROW and COL tell you how much of the array is actually filled. So the DATA step captures how many rows and columns were actually needed. That's the CALL SYMPUTX part (which relies on the data being consistent ... same number of rows for every column).
With the number of required rows and columns known, a second DATA step could "unpack" the array:
data want;
set halfway_there;
length var1-var&ncols $ 32;
array var {&ncols};
keep var1-var&ncols;
array allvals {12, 30};
do row=1 to &nrows;
do col=1 to &ncols;
var{col} = allvals{row, col};
end;
output;
end;
run;
The code is untested, so might need to be tweaked. But it's definitely a viable approach.
Clearly, VAR1 will have to be character, in order for this to happen.
Can you describe the data a bit:
I'm thinking it might be possible to load the entire data set into memory at once, but your answers will help determine that.
@MountainDew123 Hi, It's a very interesting question and I am little stumped to make it elegant as my brain isn't working the best perhaps because of too much of coffee. I will await for super elegant solution from a super user like you. However, here is my ugly yet working solution for the time being:-
data have;
input (Var1 Var2) ($);
datalines;
1 x
1 y
1 z
2 q
2 r
2 s
;
/* I split into multiple datasets in this step*/
data _null_;
declare hash myhash(multidata:'y');
rc = myhash.defineKey('var1');
rc = myhash.defineData('var1','need');
rc = myhash.defineDone( );
do until(last.var1);
set have;
by var1;
if first.var1 then
do;
myhash.clear();
need=var1;
myhash.add();
need=var2;
myhash.add();
end;
else do;
need=var2;
myhash.add();
end;
if last.var1 then myhash.output(dataset:cats('var',var1));
end;
run;
/*I combine the datasets obtained in the previous step to get your WANT*/
data want;
set var1(rename=(need=need1));
set var2(rename=(need=need2));
drop var1;
run;
Regards,
Naveen Srinivasan
PROC TRANSPOSE but you first have to add a counter. It seems like the 'easiest' approach in terms of understanding the process.
data have;
input Var1 Var2 $ ;
cards;
1 x
1 y
1 z
2 q
2 r
2 s
;
run;
data have2;
set have;
by var1;
if first.var1 then count=1;
else count+1;
run;
proc sort data=have2;
by count;
run;
proc transpose data=have2 out=want prefix=want;
by count;
var var2;
id var1;
run;
Worked like a charm thanks a ton
There is probably a less convoluted way to do this (assuming I correctly understand what you want) but, if it is what you want, here is one way to do it:
data have; input Var1 Var2 $; cards; 1 x 1 y 1 z 2 q 2 r 2 s ; proc sql noprint; select min(var1) into :minv from have ; select n(var1) into :obs from have where Var1 eq &minv. ; quit; data need; set have; _obs=ifn(mod(_n_,&obs.) gt 0,mod(_n_,&obs.),&obs.); run; proc sort data=need; by _obs; run; options validvarname=any; proc transpose data=need out=want (drop=_:); by _obs; var var2; id var1; run;
Art, CEO, AnalystFinder.com
Assuming you want to pair values based on their order:
data have;
input Var1 Var2 $;
datalines;
1 x
1 y
1 z
2 q
2 r
2 s
;
data temp;
set have; by var1;
if first.var1 then order = 0;
order + 1;
run;
proc sort data=temp; by order var1; run;
proc transpose data=temp out=want(drop=_name_ order) prefix=_;
by order;
id var1;
idlabel var1;
var var2;
run;
proc print noobs label;
run;
Double proc transpose can do that . data have; input Var1 Var2 $; datalines; 1 x 1 y 1 z 2 q 2 r 2 s ; proc transpose data=have out=temp; by var1; var var2; run; proc transpose data=temp out=want; id var1; var col:; run; OR Merge Skill. proc sql; select distinct catt('have(where=(var1=',var1,') rename=(var2=_',var1,'))') into : list separated by ' ' from have; quit; data want2; merge &list; drop var1; run;
I definitely like @Ksharp's double transpose solution more than the code I suggested and, if you really want the resulting variables to be named 1 and 2, you'd just have to insert:
options validvarname=any;
before the second proc transpose
Art, CEO, AnalystFinder.com
Why did you make the first observations have col1='x' and col2='q'? Do you want to just align the values by their order in the original dataset? If so you need a new variable to use proc transpose.
data middle ;
set have ;
by var1;
row+1;
if first.var1 then row=1;
run;
proc sort;
by row var1 ;
run;
proc tranpose data=middle out=want ;
by row ;
id var1;
var var2;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.