I have the following dataset:
ID | Question | Response |
alpha | 1 | Y |
alpha | 2 | N |
alpha | 3 | Y |
beta | 1 | N |
beta | 2 | N |
But I want this:
ID | Q1 | Q2 | Q3 |
alpha | Y | N | Y |
beta | N | N |
I tried to use proc transpose as follows:
proc transpose data = have out = want;
by id;
var Question response;
run;
My output is
id | col1 | col2 | col3 |
alpha | 1 | 2 | 3 |
alpha | Y | N | Y |
beta | 1 | 2 | |
beta | N | N |
I'm investigating arrays (admittedly my knowledge is not so good with arrays), but I don't think this is possible.
A workaround I think can be to create a dataset with just Q1 and then run a proc transpose. However, I have over 100 questions, so even with a macro statement it is very onerous.
Any thoughts?
Thanks
how about this code using arrray statement.
data now;
length id $10 Question 8 Response $1;
infile cards;
input id $ Question Response $;
cards;
ALPHA 1 Y
ALPHA 2 N
ALPHA 3 Y
BETA 1 N
BETA 2 N
;
run;
proc sort data=now out=tmp nodupkey;
by descending Question;
run;
data _null_;
set tmp(obs=1);
call symputx('max_q',Question);/* set max of questions */
run;
proc sort data=now out=new;
by id question;
run;
data new(keep=id q1-q&max_q);
set new;
by id question;
array q{&max_q} $1;/* create array */
retain q1-q&max_q;/* keep array variables to next observation */
if first.id then do;/* init array value */
do i=1 to &max_q;
call missing(q{i});
end;
end;
q{question}=Response;/* set response value to array variable by using question values as array subscript. */
if last.id;/* output only last.id observations */
run;
May I ask why you want to transpose the data in first place? A long and narrow structure is better for most SAS procedures and often also easier to deal with in SAS data steps.
Hi @sharonlee , Most SAS users and I will concur @Patrick to keep your dataset long and narrow. The long and narrow offers enormous convenience.
Anyways proc transpose is rather straight forward
data have;
input ID $ Question Response $ ;
cards;
alpha 1 Y
alpha 2 N
alpha 3 Y
beta 1 N
beta 2 N
;
proc transpose data=have out=want(drop=_:) prefix=Q;
by id;
var response;
id Question;
run;
Hi @sharonlee
As proposed by @novinosrin, the ID statement enables you to identify a variable that will be used to name the new columns, which are col1, col2, ... by default. The option ‘prefix=Q’ will add the prefix ‘Q’ to these column names -> so Q1, Q2, ... as the modalities of question are 1, 2, ...)
Best,
proc transpose data=have out=want (drop=_name_) prefix=Q;
by id;
var response;
id question;
run;
Ok it's all good. Well, I posted the transpose in my previous post. I hope that helps.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.