Hi,
I am trying to reshape a data set from long to wide form. The data set has several identifying variables. For example, the data are in the following form:
ID | Condition | Time | Factor1 | Factor2 | Factor3 |
100 | A | 1 | 8 | 4 | 1 |
100 | A | 2 | 5 | 7 | 1 |
100 | B | 1 | 8 | 8 | 9 |
100 | B | 2 | 6 | 9 | 9 |
101 | A | 1 | 1 | 3 | 1 |
101 | A | 2 | 9 | 9 | 6 |
101 | B | 1 | 3 | 9 | 7 |
101 | B | 2 | 8 | 8 | 7 |
102 | A | 1 | 5 | 5 | 5 |
102 | A | 2 | 2 | 8 | 3 |
102 | B | 1 | 2 | 2 | 7 |
102 | B | 2 | 2 | 7 | 8 |
I want to reshape the data into the following form:
ID | A_1_F1 | A_1_F2 | A_1_F3 | A_2_F1 | A_2_F2 | A_2_F3 | B_1_F1 | B_1_F2 | B_1_F3 | B_2_F1 | B_2_F2 | B_2_F3 |
100 | 8 | 4 | 1 | 5 | 7 | 1 | 8 | 8 | 9 | 6 | 9 | 9 |
101 | 1 | 3 | 1 | 9 | 9 | 6 | 3 | 9 | 7 | 8 | 8 | 7 |
102 | 5 | 5 | 5 | 2 | 8 | 3 | 2 | 2 | 7 | 2 | 7 | 8 |
Here is an example data set that I'd like to convert (to the above structure):
data test;
input ID Condition Time Factor1 Factor2 Factor3;
datalines;
100 A 1 8 4 1
100 A 2 5 7 1
100 B 1 8 8 9
100 B 2 6 9 9
101 A 1 1 3 1
101 A 2 9 9 6
101 B 1 3 9 7
101 B 2 8 8 7
102 A 1 5 5 5
102 A 2 2 8 3
102 B 1 2 2 7
102 B 2 2 7 8
;
run;
I have tried the transpose command, but I don't know how to get it to work with multiple identifying variables. Thanks so much for your help!
data test;
input ID Condition $ Time Factor1 Factor2 Factor3;
datalines;
100 A 1 8 4 1
100 A 2 5 7 1
100 B 1 8 8 9
100 B 2 6 9 9
101 A 1 1 3 1
101 A 2 9 9 6
101 B 1 3 9 7
101 B 2 8 8 7
102 A 1 5 5 5
102 A 2 2 8 3
102 B 1 2 2 7
102 B 2 2 7 8
;
proc transpose data=test out=temp0;
by id condition time notsorted;
var factor:;
run;
data temp1(drop=_:);
set temp0;
factor = cats("F",substr(_name_, anydigit(_name_)));
run;
proc transpose data=temp1 out=want(drop=_NAME_) delimiter="_"n;
by id notsorted;
id condition time factor;
run;
PG
Awesome - it works quite nicely. Thank you for your help!
I prefer to ARRAY.
data test; input ID Condition $ Time Factor1 Factor2 Factor3; datalines; 100 A 1 8 4 1 100 A 2 5 7 1 100 B 1 8 8 9 100 B 2 6 9 9 101 A 1 1 3 1 101 A 2 9 9 6 101 B 1 3 9 7 101 B 2 8 8 7 102 A 1 5 5 5 102 A 2 2 8 3 102 B 1 2 2 7 102 B 2 2 7 8 ; run; data want(keep=id a: b:); set test; array a{*} A_1_F1 A_1_F2 A_1_F3 A_2_F1 A_2_F2 A_2_F3 B_1_F1 B_1_F2 B_1_F3 B_2_F1 B_2_F2 B_2_F3; retain A_1_F1 A_1_F2 A_1_F3 A_2_F1 A_2_F2 A_2_F3 B_1_F1 B_1_F2 B_1_F3 B_2_F1 B_2_F2 B_2_F3; i+1;a{i}=Factor1;i+1;a{i}=Factor2; i+1;a{i}=Factor3; if mod(_n_,4)=0 then do;output;i=0;end; run;
Ksharp
I too prefer arrays in a datastep to proc transpose. My problem is that I have "incompete" data and need to "square it up" before performing the code you suggested. Any suggestions for me would be appreciated!
From this:
ID1 QUEST ANS
123 Q1 Boy
123 Q2 Red
123 Q3 Farm
124 Q1 Girl
124 Q3 City
125 Q2 Blue
126 Q1 Boy
126 Q2 Yellow
126 Q3 City
To this (the quotes are placeholders for missing values):
ID1 Q1 Q2 Q3
123 Boy Red Farm
124 Girl '' City
125 '' Blue ''
126 Boy Yellow City
Hi,
try this:
data have;
input ID QUEST :$2. ANS $6.;
cards;
123 Q1 Boy
123 Q2 Red
123 Q3 Farm
124 Q1 Girl
124 Q3 City
125 Q2 Blue
126 Q1 Boy
126 Q2 Yellow
126 Q3 City
;
data want(keep=id q1-q3);
set have;
by id;
array _q(*)$ q1-q3;
retain q1-q3;
i=input(substr(quest,2,1),1.);
_q(i)=ans;
if last.id then do; output; call missing(q1,q2,q3);end;run;
proc print;run;
obs ID q1 q2 q3
1 123 Boy Red Farm
2 124 Girl City
3 125 Blue
4 126 Boy Yellow City
Linlin
Thank you Linlin!
Thank you MikeZdeb!
Hi Linlin,
It's been awhile LOL! I really like your solution; however, in reality Q1-Q3 looks more like:
CKD001CA CKD004DB CKD005DB CKD020CA CKD021CA CKD022DT CKD023DT CKD049CA CKD050CA CKD056CA
Not sure how to create the index. Any suggestions?
Thanks!
Caroline
Hi Caroline,
try the code below:
data have;
input ID QUEST :$8. ANS $6.;
cards;
123 CKD001CA Boy
123 CKD004DB Red
123 CKD020CA Farm
124 CKD005DB Girl
124 CKD001CA City
125 CKD021CA Blue
126 CKD005DB Boy
126 CKD021CA Yellow
126 CKD005DB City
;
data want(keep=id ckd:);
set have;
by id;
array _q(*)$ ckd1-ckd21;
retain ckd:;
i=input(compress(quest,,'kd'),3.);
_q(i)=ans;
if last.id then do;output; call missing(of ckd:);end;run;
proc print;run;
c c c c c c c c c c c c
c c c c c c c c c k k k k k k k k k k k k
O k k k k k k k k k d d d d d d d d d d d d
b I d d d d d d d d d 1 1 1 1 1 1 1 1 1 1 2 2
s D 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1 123 Boy Red Farm
2 124 City Girl
3 125 Blue
4 126 City Yellow
Linlin
hi ... use Linlin's data ...
proc transpose data=have out=want(drop=_name_);
by id1;
id quest;
var ans;
run;
and here's another way to use an array ...
data want (drop=quest ans);
array q(3) $6;
do until (last.id1);
set x;
by id1;
q(input(char(quest,2),1.)) = ans;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.