I have the following dataset:
ID | Question | Response |
alpha | 1 | Y |
alpha | 2 | N |
alpha | 3 | Y |
beta | 1 | N |
beta | 2 | N |
But I want this:
ID | Q1 | Q2 | Q3 |
alpha | Y | N | Y |
beta | N | N |
I tried to use proc transpose as follows:
proc transpose data = have out = want;
by id;
var Question response;
run;
My output is
id | col1 | col2 | col3 |
alpha | 1 | 2 | 3 |
alpha | Y | N | Y |
beta | 1 | 2 | |
beta | N | N |
I'm investigating arrays (admittedly my knowledge is not so good with arrays), but I don't think this is possible.
A workaround I think can be to create a dataset with just Q1 and then run a proc transpose. However, I have over 100 questions, so even with a macro statement it is very onerous.
Any thoughts?
Thanks
how about this code using arrray statement.
data now;
length id $10 Question 8 Response $1;
infile cards;
input id $ Question Response $;
cards;
ALPHA 1 Y
ALPHA 2 N
ALPHA 3 Y
BETA 1 N
BETA 2 N
;
run;
proc sort data=now out=tmp nodupkey;
by descending Question;
run;
data _null_;
set tmp(obs=1);
call symputx('max_q',Question);/* set max of questions */
run;
proc sort data=now out=new;
by id question;
run;
data new(keep=id q1-q&max_q);
set new;
by id question;
array q{&max_q} $1;/* create array */
retain q1-q&max_q;/* keep array variables to next observation */
if first.id then do;/* init array value */
do i=1 to &max_q;
call missing(q{i});
end;
end;
q{question}=Response;/* set response value to array variable by using question values as array subscript. */
if last.id;/* output only last.id observations */
run;
May I ask why you want to transpose the data in first place? A long and narrow structure is better for most SAS procedures and often also easier to deal with in SAS data steps.
Hi @sharonlee , Most SAS users and I will concur @Patrick to keep your dataset long and narrow. The long and narrow offers enormous convenience.
Anyways proc transpose is rather straight forward
data have;
input ID $ Question Response $ ;
cards;
alpha 1 Y
alpha 2 N
alpha 3 Y
beta 1 N
beta 2 N
;
proc transpose data=have out=want(drop=_:) prefix=Q;
by id;
var response;
id Question;
run;
Hi @sharonlee
As proposed by @novinosrin, the ID statement enables you to identify a variable that will be used to name the new columns, which are col1, col2, ... by default. The option ‘prefix=Q’ will add the prefix ‘Q’ to these column names -> so Q1, Q2, ... as the modalities of question are 1, 2, ...)
Best,
proc transpose data=have out=want (drop=_name_) prefix=Q;
by id;
var response;
id question;
run;
Ok it's all good. Well, I posted the transpose in my previous post. I hope that helps.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.