Reshape from long to wide form with several identifying variables

dadrivr · Posted 03-03-2012 08:23 PM

Hi,

I am trying to reshape a data set from long to wide form. The data set has several identifying variables. For example, the data are in the following form:

ID	Condition	Time	Factor1	Factor2	Factor3
100	A	1	8	4	1
100	A	2	5	7	1
100	B	1	8	8	9
100	B	2	6	9	9
101	A	1	1	3	1
101	A	2	9	9	6
101	B	1	3	9	7
101	B	2	8	8	7
102	A	1	5	5	5
102	A	2	2	8	3
102	B	1	2	2	7
102	B	2	2	7	8

I want to reshape the data into the following form:

ID	A_1_F1	A_1_F2	A_1_F3	A_2_F1	A_2_F2	A_2_F3	B_1_F1	B_1_F2	B_1_F3	B_2_F1	B_2_F2	B_2_F3
100	8	4	1	5	7	1	8	8	9	6	9	9
101	1	3	1	9	9	6	3	9	7	8	8	7
102	5	5	5	2	8	3	2	2	7	2	7	8

Here is an example data set that I'd like to convert (to the above structure):

data test;

input ID Condition Time Factor1 Factor2 Factor3;

datalines;

100 A 1 8 4 1

100 A 2 5 7 1

100 B 1 8 8 9

100 B 2 6 9 9

101 A 1 1 3 1

101 A 2 9 9 6

101 B 1 3 9 7

101 B 2 8 8 7

102 A 1 5 5 5

102 A 2 2 8 3

102 B 1 2 2 7

102 B 2 2 7 8

;

run;

I have tried the transpose command, but I don't know how to get it to work with multiple identifying variables. Thanks so much for your help!

PGStats · Posted 03-03-2012 08:52 PM

data test;
input ID Condition $ Time Factor1 Factor2 Factor3;
datalines;
100 A 1 8 4 1
100 A 2 5 7 1
100 B 1 8 8 9
100 B 2 6 9 9
101 A 1 1 3 1
101 A 2 9 9 6
101 B 1 3 9 7
101 B 2 8 8 7
102 A 1 5 5 5
102 A 2 2 8 3
102 B 1 2 2 7
102 B 2 2 7 8
;

proc transpose data=test out=temp0;
by id condition time notsorted;
var factor:;
run;

data temp1(drop=_:);
set temp0;
factor = cats("F",substr(_name_, anydigit(_name_)));
run;

proc transpose data=temp1 out=want(drop=_NAME_) delimiter="_"n;
by id notsorted;
id condition time factor;
run;

PG

dadrivr · Posted 03-05-2012 01:16 AM

Awesome - it works quite nicely. Thank you for your help!

Ksharp · Posted 03-05-2012 05:17 AM

I prefer to ARRAY.

data test;
input ID Condition $ Time Factor1 Factor2 Factor3;
datalines;
100 A 1 8 4 1
100 A 2 5 7 1
100 B 1 8 8 9
100 B 2 6 9 9
101 A 1 1 3 1
101 A 2 9 9 6
101 B 1 3 9 7
101 B 2 8 8 7
102 A 1 5 5 5
102 A 2 2 8 3
102 B 1 2 2 7
102 B 2 2 7 8
;
run;
data want(keep=id a: b:);
 set test;
 array a{*} A_1_F1     A_1_F2     A_1_F3     A_2_F1     A_2_F2     A_2_F3     B_1_F1     B_1_F2     B_1_F3     B_2_F1     B_2_F2     B_2_F3;
 retain A_1_F1     A_1_F2     A_1_F3     A_2_F1     A_2_F2     A_2_F3     B_1_F1     B_1_F2     B_1_F3     B_2_F1     B_2_F2     B_2_F3;
 i+1;a{i}=Factor1;i+1;a{i}=Factor2; i+1;a{i}=Factor3;
 if mod(_n_,4)=0 then do;output;i=0;end;
run;

Ksharp

calijam · Posted 05-09-2012 03:40 PM

I too prefer arrays in a datastep to proc transpose. My problem is that I have "incompete" data and need to "square it up" before performing the code you suggested. Any suggestions for me would be appreciated!

From this:

ID1 QUEST ANS

123 Q1 Boy

123 Q2 Red

123 Q3 Farm

124 Q1 Girl

124 Q3 City

125 Q2 Blue

126 Q1 Boy

126 Q2 Yellow

126 Q3 City

To this (the quotes are placeholders for missing values):

ID1 Q1 Q2 Q3

123 Boy Red Farm

124 Girl '' City

125 '' Blue ''

126 Boy Yellow City

Linlin · Posted 05-09-2012 04:18 PM

Hi,

try this:

data have;
input ID QUEST :$2. ANS $6.;
cards;
123 Q1 Boy
123 Q2 Red
123 Q3 Farm
124 Q1 Girl
124 Q3 City
125 Q2 Blue
126 Q1 Boy
126 Q2 Yellow
126 Q3 City
;
data want(keep=id q1-q3);
set have;
by id;
array _q(*)$ q1-q3;
retain q1-q3;
i=input(substr(quest,2,1),1.);
_q(i)=ans;
if last.id then do; output; call missing(q1,q2,q3);end;run;
proc print;run;
obs ID q1 q2 q3

                                 1     123    Boy     Red       Farm
                                 2     124    Girl              City
                                 3     125            Blue
                                 4     126    Boy     Yellow    City

Linlin

calijam · Posted 05-09-2012 05:55 PM

Thank you Linlin!

calijam · Posted 05-09-2012 05:55 PM

Thank you MikeZdeb!

calijam · Posted 11-07-2012 08:17 PM

Hi Linlin,

It's been awhile LOL! I really like your solution; however, in reality Q1-Q3 looks more like:

CKD001CA CKD004DB CKD005DB CKD020CA CKD021CA CKD022DT CKD023DT CKD049CA CKD050CA CKD056CA

Not sure how to create the index. Any suggestions?

Thanks!

Caroline

Linlin · Posted 11-07-2012 08:41 PM

Hi Caroline,

try the code below:

data have;

input ID QUEST :$8. ANS $6.;

cards;

123 CKD001CA Boy

123 CKD004DB Red

123 CKD020CA Farm

124 CKD005DB Girl

124 CKD001CA City

125 CKD021CA Blue

126 CKD005DB Boy

126 CKD021CA Yellow

126 CKD005DB City

;

data want(keep=id ckd:);

set have;

by id;

array _q(*)$ ckd1-ckd21;

retain ckd:;

i=input(compress(quest,,'kd'),3.);

_q(i)=ans;

if last.id then do;output; call missing(of ckd:);end;run;

proc print;run;

c c c c c c c c c c c c

c c c c c c c c c k k k k k k k k k k k k

O k k k k k k k k k d d d d d d d d d d d d

b I d d d d d d d d d 1 1 1 1 1 1 1 1 1 1 2 2

s D 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

1 123 Boy Red Farm

2 124 City Girl

3 125 Blue

4 126 City Yellow

Linlin

MikeZdeb · Posted 05-09-2012 04:41 PM

hi ... use Linlin's data ...

proc transpose data=have out=want(drop=_name_);

by id1;

id quest;

var ans;

run;

and here's another way to use an array ...

data want (drop=quest ans);

array q(3) $6;

do until (last.id1);

set x;

by id1;

q(input(char(quest,2),1.)) = ans;

end;

run;