I have the following dataset:
ID | Question | Response |
alpha | 1 | Y |
alpha | 2 | N |
alpha | 3 | Y |
beta | 1 | N |
beta | 2 | N |
But I want this:
ID | Q1 | Q2 | Q3 |
alpha | Y | N | Y |
beta | N | N |
I tried to use proc transpose as follows:
proc transpose data = have out = want;
by id;
var Question response;
run;
My output is
id | col1 | col2 | col3 |
alpha | 1 | 2 | 3 |
alpha | Y | N | Y |
beta | 1 | 2 | |
beta | N | N |
I'm investigating arrays (admittedly my knowledge is not so good with arrays), but I don't think this is possible.
A workaround I think can be to create a dataset with just Q1 and then run a proc transpose. However, I have over 100 questions, so even with a macro statement it is very onerous.
Any thoughts?
Thanks
how about this code using arrray statement.
data now;
length id $10 Question 8 Response $1;
infile cards;
input id $ Question Response $;
cards;
ALPHA 1 Y
ALPHA 2 N
ALPHA 3 Y
BETA 1 N
BETA 2 N
;
run;
proc sort data=now out=tmp nodupkey;
by descending Question;
run;
data _null_;
set tmp(obs=1);
call symputx('max_q',Question);/* set max of questions */
run;
proc sort data=now out=new;
by id question;
run;
data new(keep=id q1-q&max_q);
set new;
by id question;
array q{&max_q} $1;/* create array */
retain q1-q&max_q;/* keep array variables to next observation */
if first.id then do;/* init array value */
do i=1 to &max_q;
call missing(q{i});
end;
end;
q{question}=Response;/* set response value to array variable by using question values as array subscript. */
if last.id;/* output only last.id observations */
run;
May I ask why you want to transpose the data in first place? A long and narrow structure is better for most SAS procedures and often also easier to deal with in SAS data steps.
Hi @sharonlee , Most SAS users and I will concur @Patrick to keep your dataset long and narrow. The long and narrow offers enormous convenience.
Anyways proc transpose is rather straight forward
data have;
input ID $ Question Response $ ;
cards;
alpha 1 Y
alpha 2 N
alpha 3 Y
beta 1 N
beta 2 N
;
proc transpose data=have out=want(drop=_:) prefix=Q;
by id;
var response;
id Question;
run;
Hi @sharonlee
As proposed by @novinosrin, the ID statement enables you to identify a variable that will be used to name the new columns, which are col1, col2, ... by default. The option ‘prefix=Q’ will add the prefix ‘Q’ to these column names -> so Q1, Q2, ... as the modalities of question are 1, 2, ...)
Best,
proc transpose data=have out=want (drop=_name_) prefix=Q;
by id;
var response;
id question;
run;
Ok it's all good. Well, I posted the transpose in my previous post. I hope that helps.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.