Hi Everyone,
This might be a little confusing to explain so I will try my best. I have a survey with around ~1000 records that I'm trying to transpose from long to wide, the problem is that three questions in the survey has two questions that can have multiple options selected, and when I use proc transpose with let option it only captures one of the options. I'm unsure of how to have all the responses shown for each record ID, and doing it manually will take a very long time. I've included some sample data of what I'm looking at, my code thus far, and my desired output. Thank you so much!
Sample dataset of what I have:
data have;
input record_id labcode $ protocolcode $ question_value response date :MMDDYY10.;
format date date9.;
datalines;
1 Lab1 P1 38912 2 1/30/2020
1 Lab1 P1 38912 9 1/30/2020
1 Lab1 P1 38912 10 1/30/2020
1 Lab1 P1 60388 5 1/30/2020
1 Lab1 P1 60388 1 1/30/2020
2 Lab2 P3 47577 1 2/15/2015
2 Lab2 P3 47577 2 2/15/2015
2 Lab2 P3 47577 3 2/15/2015
2 Lab2 P3 38912 9 2/15/2015
2 Lab2 P3 38912 10 2/15/2015
3 Lab5 P5 60388 1 6/23/2021
3 Lab5 P5 60388 2 6/23/2021
3 Lab5 P5 38912 8 6/23/2021
3 Lab5 P5 38912 10 6/23/2021
3 Lab5 P5 38912 7 6/23/2021
4 Lab4 P6 38912 9 3/12/2014
4 Lab4 P6 38912 3 3/12/2014
4 Lab4 P6 47577 4 3/12/2014
4 Lab4 P6 47577 5 3/12/2014
4 Lab4 P6 47577 1 3/12/2014
5 Lab9 P8 38912 10 7/6/2020
5 Lab9 P8 38912 9 7/6/2020
5 Lab9 P8 38912 8 7/6/2020
5 Lab9 P8 60388 1 7/6/2020
5 Lab9 P8 60388 5 7/6/2020
;
run;
My transpose code so far:
options validvarname=v7;
proc transpose data=have out = want name=masterquestioncore let;
by record_id labcode protocolcode date; *i need all of these to show up in final dataset so I have them in the sort statement;
var response;
id question_value ;
run;
Here's a table showing my desired output, I am essentially looking for one record ID per row, with the question_values each their own variable and repeats are named as such:
record_Id | labcode | protocolcode | 38912_1 | 38912_2 | 38912_3 | 60388_1 | 60388_2 | 47577_1 | 47577_2 | 47577_3 | date |
1 | Lab1 | P1 | 2 | 9 | 10 | 5 | 1 | 1/30/2020 | |||
2 | Lab2 | P3 | 9 | 10 | 1 | 2 | 3 | 2/15/2015 | |||
3 | Lab5 | P5 | 8 | 7 | 10 | 1 | 2 | 6/23/2021 | |||
4 | Lab4 | P6 | 9 | 3 | 4 | 5 | 1 | 3/12/2014 | |||
5 | Lab9 | P8 | 10 | 9 | 1 | 5 | 7/6/2020 |
Hopefully I have explained it well enough. There are multiple other question_value variables that are other items on the survey, but they are all only one response per record_id, so those are easy to transpose, it is just these that are complicated. Thank you again!
Then you can use the existing dataset directly. Just use RESPONSE in both the ID and VAR statement.
proc transpose data=have prefix=Q delim=_ out=wide(drop=_name_);
by record_id labcode protocolcode date;
id question_value response;
var response;
run;
Result
Are these multiple choice questions? Something like select ALL that apply and the "RESPONSE" variable is the number of the choice selected?
If so then use the question and the response in the ID statement add an extra variable to use the VAR statement.
proc sort;
by record_id labcode protocolcode date question_value response;
run;
data for_transpose;
set have;
picked='X';
run;
proc transpose data=for_transpose prefix=Q delim=_ out=wide(drop=_name_);
by record_id labcode protocolcode date;
id question_value response;
var picked;
run;
Result
p r o r t Q e o Q Q 3 Q Q Q Q Q Q Q Q Q Q Q c l c 3 3 8 6 6 4 4 4 3 3 6 3 4 4 o a o 8 8 9 0 0 7 7 7 8 8 0 8 7 7 r b l 9 9 1 3 3 5 5 5 9 9 3 9 5 5 d c c d 1 1 2 8 8 7 7 7 1 1 8 1 7 7 O _ o o a 2 2 _ 8 8 7 7 7 2 2 8 2 7 7 b i d d t _ _ 1 _ _ _ _ _ _ _ _ _ _ _ s d e e e 2 9 0 1 5 1 2 3 7 8 2 3 4 5 1 1 Lab1 P1 30JAN2020 X X X X X 2 2 Lab2 P3 15FEB2015 X X X X X 3 3 Lab5 P5 23JUN2021 X X X X X 4 4 Lab4 P6 12MAR2014 X X X X X 5 5 Lab9 P8 06JUL2020 X X X X X
Then you can use the existing dataset directly. Just use RESPONSE in both the ID and VAR statement.
proc transpose data=have prefix=Q delim=_ out=wide(drop=_name_);
by record_id labcode protocolcode date;
id question_value response;
var response;
run;
Result
If the values are free text the add an extra variable to indicate which replicate it is and include that new varaible in the ID statement.
data for_transpose;
do rep=1 by 1 until(last.date);
set have;
by record_id labcode protocolcode date;
output;
end;
run;
proc transpose data=for_transpose prefix=Q delim=_ out=wide(drop=_name_);
by record_id labcode protocolcode date;
id question_value rep;
var response;
run;
Result:
p r o r t e o Q Q Q Q Q Q Q Q Q Q Q Q Q Q c l c 3 3 3 6 6 4 4 4 3 3 6 6 4 4 o a o 8 8 8 0 0 7 7 7 8 8 0 0 7 7 r b l 9 9 9 3 3 5 5 5 9 9 3 3 5 5 d c c d 1 1 1 8 8 7 7 7 1 1 8 8 7 7 O _ o o a 2 2 2 8 8 7 7 7 2 2 8 8 7 7 b i d d t _ _ _ _ _ _ _ _ _ _ _ _ _ _ s d e e e 1 2 3 4 5 1 2 3 4 5 1 2 4 5 1 1 Lab1 P1 30JAN2020 2 9 10 5 1 . . . . . . . . . 2 2 Lab2 P3 15FEB2015 . . . . . 1 2 3 9 10 . . . . 3 3 Lab5 P5 23JUN2021 . . 8 . . . . . 10 7 1 2 . . 4 4 Lab4 P6 12MAR2014 9 3 . . . . . 4 . . . . 5 1 5 5 Lab9 P8 06JUL2020 10 9 8 1 5 . . . . . . . . .
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.