Solved: transforming long form to wide form; multiple character and numerical ...

huanaca · Posted 10-17-2012 08:44 PM

Hello,

I would greatly appreciate help transforming my data from long to wide. I have not be able to work out the code. Thank you in advance for your help.

It is a very large data set, but the organization looks like this:

VARIABLES:

CahseeTestDate04 (Character)

CahseeAttemptQty04 (Num)

StdPseudoId (Num)

CahseeTestName04 (Char)

CahseeScaledScore04 (Num)

WHAT I HAVE:

CahseeTestDate04 CahseeAttemptQty04 StdPseudoId CahseeTestName04 CahseeScaledScore04

11 May 2004 1 100014886645 ELA 432

16 Mar 2004 1 100014886645 MAT 385

24 Apr 2004 2 200079994422 ELA 345

03 Apr 2004 3 450029746555 ELA 450

05 May 2004 3 450029746555 MAT 440

26 Mar 2004 2 770033338888 MAT 367

WHAT I WANT:

StdPseudoId	CahseeTestDate04ELA	CahseeAttemptQty04ELA	CahseeScaledScore04ELA	CahseeTestDate04MAT	CahseeAttemptQty04MAT	CahseeScaledScore04MAT
100014886645	11 May 2004	1	432	16 Mar 2004	1	385
200079994422	24 Apr 2004	2	345
450029746555	03 Apr 2004	3	450	05 May 2004	3	440
770033338888				26 Mar 2004	2	367

art297 · Posted 10-17-2012 10:32 PM

There is probably a more direct way but, at least, the following will achieve what you want:

data part1 (keep=StdPseudoId CahseeTestDate04 _name_);

set have;

_name_=cats("cahseeTestDate04",CahseeTestName04);

run;

proc transpose data=part1 out=out1 (drop=_:);

by StdPseudoId;

run;

data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);

set have;

_name_=cats("CahseeAttemptQty04",CahseeTestName04);

run;

proc transpose data=part2 out=out2 (drop=_:);

by StdPseudoId;

run;

data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);

set have;

_name_=cats("CahseeScaledScore04",CahseeTestName04);

run;

proc transpose data=part3 out=out3 (drop=_:);

by StdPseudoId;

run;

data want;

set out1;

set out2;

set out3;

run;

If you want the variables reordered, you could do that with one additional datastep using a retain statement.

View solution in original post

art297 · Posted 10-17-2012 10:32 PM

There is probably a more direct way but, at least, the following will achieve what you want:

data part1 (keep=StdPseudoId CahseeTestDate04 _name_);

set have;

_name_=cats("cahseeTestDate04",CahseeTestName04);

run;

proc transpose data=part1 out=out1 (drop=_:);

by StdPseudoId;

run;

data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);

set have;

_name_=cats("CahseeAttemptQty04",CahseeTestName04);

run;

proc transpose data=part2 out=out2 (drop=_:);

by StdPseudoId;

run;

data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);

set have;

_name_=cats("CahseeScaledScore04",CahseeTestName04);

run;

proc transpose data=part3 out=out3 (drop=_:);

by StdPseudoId;

run;

data want;

set out1;

set out2;

set out3;

run;

If you want the variables reordered, you could do that with one additional datastep using a retain statement.

huanaca · Posted 10-18-2012 02:49 PM

Thanks very much for your response. It seems to have worked very well, with the exception of out1 -- which does not contain any observations for some reason. So then the final 'want' file contains no observations, though it has the correct variables. Is it possible this has something to do with CahseeTestDate04 being a character variable, whereas CahseeAttemptQty04 and CahseeScaledScore04 are number variables?

1111 data part1 (keep=StdPseudoId CahseeTestDate04 _name_);

1112 set have;

1113 _name_=cats("CahseeTestDate04",CahseeTestName04);

1114 run;

NOTE: There were 85651 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.PART1 has 85651 observations and 3 variables.

NOTE: Compressing data set WORK.PART1 decreased size by 76.62 percent.

Compressed is 371 pages; un-compressed would require 1587 pages.

NOTE: DATA statement used (Total process time):

real time 0.20 seconds

cpu time 0.18 seconds

1115

1116 proc transpose data=part1 out=out1 (drop=_:);

1117 by StdPseudoId;

1118 run;

NOTE: No variables to transpose.

NOTE: There were 85651 observations read from the data set WORK.PART1.

NOTE: The data set WORK.OUT1 has 0 observations and 3 variables.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

real time 0.60 seconds

cpu time 0.60 seconds

1119

1120 data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);

1121 set have;

1122 _name_=cats("CahseeAttemptQty04",CahseeTestName04);

1123 run;

NOTE: There were 85651 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.PART2 has 85651 observations and 3 variables.

NOTE: Compressing data set WORK.PART2 decreased size by 78.43 percent.

Compressed is 330 pages; un-compressed would require 1530 pages.

NOTE: DATA statement used (Total process time):

real time 0.17 seconds

cpu time 0.17 seconds

1124

1125 proc transpose data=part2 out=out2 (drop=_:);

1126 by StdPseudoId;

1127 run;

NOTE: There were 85651 observations read from the data set WORK.PART2.

NOTE: The data set WORK.OUT2 has 44431 observations and 3 variables.

NOTE: Compressing data set WORK.OUT2 increased size by 19.62 percent.

Compressed is 317 pages; un-compressed would require 265 pages.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

real time 0.87 seconds

cpu time 0.82 seconds

1128

1129 data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);

1130 set have;

1131 _name_=cats("CahseeScaledScore04",CahseeTestName04);

1132 run;

NOTE: There were 85651 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.PART3 has 85651 observations and 3 variables.

NOTE: Compressing data set WORK.PART3 decreased size by 77.19 percent.

Compressed is 349 pages; un-compressed would require 1530 pages.

NOTE: DATA statement used (Total process time):

real time 0.23 seconds

cpu time 0.23 seconds

1133

1134 proc transpose data=part3 out=out3 (drop=_:);

1135 by StdPseudoId;

1136 run;

NOTE: There were 85651 observations read from the data set WORK.PART3.

NOTE: The data set WORK.OUT3 has 44431 observations and 3 variables.

NOTE: Compressing data set WORK.OUT3 increased size by 27.55 percent.

Compressed is 338 pages; un-compressed would require 265 pages.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

real time 0.78 seconds

cpu time 0.76 seconds

1137

1138 data want;

1139 set out1;

1140 set out2;

1141 set out3;

1142 run;

NOTE: There were 0 observations read from the data set WORK.OUT1.

NOTE: The data set WORK.WANT has 0 observations and 7 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

art297 · Posted 10-18-2012 03:23 PM

Yes, that would indeed create a problem, but an easy one to fix. Just replace the first transpose with:

proc transpose data=part1 out=out1 (drop=_:);

var CahseeTestDate04;

by StdPseudoId;

run;

huanaca · Posted 10-18-2012 05:40 PM

Thanks very much. That worked perfectly. I certainly appreciate your time. (I'm learning!)

Ksharp · Posted 10-18-2012 02:27 AM

Suppose every StdPseudoId only one CahseeTestName04 .

data x;
input CahseeTestDate04  & $20.  CahseeAttemptQty04    StdPseudoId    CahseeTestName04 $   CahseeScaledScore04 ;
cards;
11 May 2004                        1                        100014886645            ELA                                    432
16 Mar 2004                        1                        100014886645             MAT                                   385
24 Apr 2004                         2                       200079994422             ELA                                    345
03 Apr 2004                         3                       450029746555             ELA                                    450
05 May 2004                        3                       450029746555             MAT                                   440
26 Mar 2004                         2                       770033338888             MAT                                   367
;
run;
proc sql noprint;
select distinct catt('x(rename=(CahseeTestDate04=CahseeTestDate04',strip(CahseeTestName04),
                            ' CahseeAttemptQty04=CahseeAttemptQty04',strip(CahseeTestName04),
                            ' CahseeScaledScore04=CahseeScaledScore04',strip(CahseeTestName04),
                     ') where=(CahseeTestName04="',CahseeTestName04,'"))' ) into : list separated by ' '
from x;
quit;

data want;
 merge &list ;
 by StdPseudoId;
run;

Ksharp

art297 · Posted 10-18-2012 01:48 PM

You had posted but apparently deleted a response that included your log. While I can no longer check it from what I recall the error you were receiving was the result of having multiple records, for either ELA, MAT or both, that also had missing values for the StdPseudoId field.

The method will fail if any id has multiple records for either ELA or MAT.

huanaca · Posted 10-18-2012 02:51 PM

thanks, yeah. I had found two observations with missing IDs. When I deleted those it was ok. However, I have another problem. I just replied to you with that one.

transforming long form to wide form; multiple character and numerical variables

Re: transforming long form to wide form; multiple character and numerical variables

Re: transforming long form to wide form; multiple character and numerical variables

Re: transforming long form to wide form; multiple character and numerical variables

Re: transforming long form to wide form; multiple character and numerical variables

Re: transforming long form to wide form; multiple character and numerical variables

Re: transforming long form to wide form; multiple character and numerical variables

Re: transforming long form to wide form; multiple character and numerical variables

Re: transforming long form to wide form; multiple character and numerical variables

SAS Innovate 2025: Call for Content

Classroom Training Available!