Help using Base SAS procedures

Reshape from long to wide form with several identifying variables

Reply
Occasional Contributor
Posts: 6

Reshape from long to wide form with several identifying variables

Hi,

I am trying to reshape a data set from long to wide form.  The data set has several identifying variables.  For example, the data are in the following form:

IDConditionTimeFactor1Factor2Factor3
100A1841
100A2571
100B1889
100B2699
101A1131
101A2996
101B1397
101B2887
102A1555
102A2283
102B1227
102B2278

I want to reshape the data into the following form:

IDA_1_F1A_1_F2A_1_F3A_2_F1A_2_F2A_2_F3B_1_F1B_1_F2B_1_F3B_2_F1B_2_F2B_2_F3
100841571889699
101131996397887
102555283227278

Here is an example data set that I'd like to convert (to the above structure):

data test;

  input ID Condition Time Factor1 Factor2 Factor3;

datalines;

100 A 1 8 4 1

100 A 2 5 7 1

100 B 1 8 8 9

100 B 2 6 9 9

101 A 1 1 3 1

101 A 2 9 9 6

101 B 1 3 9 7

101 B 2 8 8 7

102 A 1 5 5 5

102 A 2 2 8 3

102 B 1 2 2 7

102 B 2 2 7 8

;

run;

I have tried the transpose command, but I don't know how to get it to work with multiple identifying variables.  Thanks so much for your help!


Respected Advisor
Posts: 4,919

Reshape from long to wide form with several identifying variables

data test;
input ID Condition $ Time Factor1 Factor2 Factor3;
datalines;
100 A 1 8 4 1
100 A 2 5 7 1
100 B 1 8 8 9
100 B 2 6 9 9
101 A 1 1 3 1
101 A 2 9 9 6
101 B 1 3 9 7
101 B 2 8 8 7
102 A 1 5 5 5
102 A 2 2 8 3
102 B 1 2 2 7
102 B 2 2 7 8
;

proc transpose data=test out=temp0;
by id condition time notsorted;
var factor:;
run;

data temp1(drop=_Smiley Happy;
set temp0;
factor = cats("F",substr(_name_, anydigit(_name_)));
run;

proc transpose data=temp1 out=want(drop=_NAME_) delimiter="_"n;
by id notsorted;
id condition time factor;
run;

PG

PG
Occasional Contributor
Posts: 6

Reshape from long to wide form with several identifying variables

Awesome - it works quite nicely.  Thank you for your help!

Super User
Posts: 10,018

Reshape from long to wide form with several identifying variables

I prefer to ARRAY.

data test;
input ID Condition $ Time Factor1 Factor2 Factor3;
datalines;
100 A 1 8 4 1
100 A 2 5 7 1
100 B 1 8 8 9
100 B 2 6 9 9
101 A 1 1 3 1
101 A 2 9 9 6
101 B 1 3 9 7
101 B 2 8 8 7
102 A 1 5 5 5
102 A 2 2 8 3
102 B 1 2 2 7
102 B 2 2 7 8
;
run;
data want(keep=id a: b:);
 set test;
 array a{*} A_1_F1     A_1_F2     A_1_F3     A_2_F1     A_2_F2     A_2_F3     B_1_F1     B_1_F2     B_1_F3     B_2_F1     B_2_F2     B_2_F3;
 retain A_1_F1     A_1_F2     A_1_F3     A_2_F1     A_2_F2     A_2_F3     B_1_F1     B_1_F2     B_1_F3     B_2_F1     B_2_F2     B_2_F3;
 i+1;a{i}=Factor1;i+1;a{i}=Factor2; i+1;a{i}=Factor3;
 if mod(_n_,4)=0 then do;output;i=0;end;
run;


Ksharp

New Contributor
Posts: 4

Re: Reshape from long to wide form with several identifying variables

I too prefer arrays in a datastep to proc transpose.  My problem is that I have "incompete" data and need to "square it up" before performing the code you suggested.  Any suggestions for me would be appreciated!

From this:

ID1  QUEST   ANS

123  Q1      Boy

123  Q2      Red

123  Q3      Farm

124  Q1      Girl

124  Q3      City

125  Q2      Blue

126  Q1      Boy

126  Q2      Yellow

126  Q3      City

To this (the quotes are placeholders for missing values):

ID1  Q1    Q2      Q3

123  Boy   Red     Farm

124  Girl  ''      City

125  ''    Blue    ''

126  Boy   Yellow  City

Super Contributor
Posts: 1,636

Re: Reshape from long to wide form with several identifying variables

Hi,

try this:


data have;
input ID  QUEST :$2.   ANS $6.;
cards;
123  Q1 Boy
123  Q2 Red
123  Q3 Farm
124  Q1  Girl
124  Q3 City
125  Q2 Blue
126  Q1 Boy
126  Q2 Yellow
126  Q3 City
;
data want(keep=id q1-q3);
  set have;
  by id;
  array _q(*)$ q1-q3;
  retain q1-q3;
  i=input(substr(quest,2,1),1.);
  _q(i)=ans;
  if last.id then do; output; call missing(q1,q2,q3);end;run;
  proc print;run;
                                obs     ID     q1     q2         q3

                                 1     123    Boy     Red       Farm
                                 2     124    Girl              City
                                 3     125            Blue
                                 4     126    Boy     Yellow    City

Linlin

New Contributor
Posts: 4

Re: Reshape from long to wide form with several identifying variables

Thank you Linlin!

New Contributor
Posts: 4

Re: Reshape from long to wide form with several identifying variables

Thank you MikeZdeb!

New Contributor
Posts: 4

Re: Reshape from long to wide form with several identifying variables

Hi Linlin,

It's been awhile LOL!  I really like your solution; however, in reality Q1-Q3 looks more like:

CKD001CA CKD004DB CKD005DB CKD020CA CKD021CA CKD022DT CKD023DT CKD049CA CKD050CA CKD056CA

Not sure how to create the index.  Any suggestions?

Thanks!

Caroline

Super Contributor
Posts: 1,636

Re: Reshape from long to wide form with several identifying variables

Hi Caroline,

try the code below:

data have;

input ID  QUEST :$8.   ANS $6.;

cards;

123  CKD001CA Boy

123  CKD004DB Red

123  CKD020CA Farm

124  CKD005DB  Girl

124  CKD001CA City

125  CKD021CA Blue

126  CKD005DB Boy

126  CKD021CA Yellow

126  CKD005DB City

;

data want(keep=id ckdSmiley Happy;

  set have;

  by id;

  array _q(*)$ ckd1-ckd21;

  retain ckd:;

  i=input(compress(quest,,'kd'),3.);

  _q(i)=ans;

if last.id then do;output; call missing(of ckdSmiley Happy;end;run;

  proc print;run;

                                 c c c c c c c c c c  c     c

                 c   c c  c   c   c c c c k k k k k k k k k k  k     k

          O      k   k k  k   k   k k k k d d d d d d d d d d  d     d

          b  I   d   d d  d   d   d d d d 1 1 1 1 1 1 1 1 1 1  2     2

          s  D   1   2 3  4   5   6 7 8 9 0 1 2 3 4 5 6 7 8 9  0     1

          1 123 Boy      Red                                  Farm

          2 124 City         Girl

          3 125                                                    Blue

          4 126              City                                  Yellow

Linlin

Valued Guide
Posts: 765

Re: Reshape from long to wide form with several identifying variables

hi ... use Linlin's data ...

proc transpose data=have out=want(drop=_name_);

by id1;

id quest;

var ans;

run;

and here's another way to use an array ...

data want (drop=quest ans);

array q(3) $6;

do until (last.id1);

   set x;

   by id1;

   q(input(char(quest,2),1.)) = ans;

end;

run;

Ask a Question
Discussion stats
  • 10 replies
  • 4837 views
  • 0 likes
  • 6 in conversation