Help using Base SAS procedures

transforming long form to wide form; multiple character and numerical variables

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

transforming long form to wide form; multiple character and numerical variables

Hello,

I would greatly appreciate help transforming my data from long to wide. I have not be able to work out the code. Thank you in advance for your help.

It is a very large data set, but the organization looks like this:

VARIABLES:

CahseeTestDate04 (Character)

CahseeAttemptQty04 (Num)

StdPseudoId (Num)

CahseeTestName04 (Char)

CahseeScaledScore04 (Num)

WHAT I HAVE:

CahseeTestDate04    CahseeAttemptQty04    StdPseudoId    CahseeTestName04    CahseeScaledScore04

11 May 2004                        1                        100014886645            ELA                                    432

16 Mar 2004                        1                        100014886645             MAT                                   385

24 Apr 2004                         2                       200079994422             ELA                                    345

03 Apr 2004                         3                       450029746555             ELA                                    450

05 May 2004                        3                       450029746555             MAT                                   440

26 Mar 2004                         2                       770033338888             MAT                                   367


WHAT I WANT:

StdPseudoIdCahseeTestDate04ELACahseeAttemptQty04ELACahseeScaledScore04ELACahseeTestDate04MATCahseeAttemptQty04MATCahseeScaledScore04MAT
100014886645

11 May 2004

143216 Mar 20041385
20007999442224 Apr 20042345
450029746555

03 Apr 2004

345005 May 20043440
77003333888826 Mar 2004
2367


                                  
                                                                                                           


Accepted Solutions
Solution
‎10-17-2012 10:32 PM
PROC Star
Posts: 7,363

Re: transforming long form to wide form; multiple character and numerical variables

There is probably a more direct way but, at least, the following will achieve what you want:

data part1 (keep=StdPseudoId CahseeTestDate04 _name_);

  set have;

  _name_=cats("cahseeTestDate04",CahseeTestName04);

run;

proc transpose data=part1 out=out1 (drop=_Smiley Happy;

  by StdPseudoId;

run;

data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);

  set have;

  _name_=cats("CahseeAttemptQty04",CahseeTestName04);

run;

proc transpose data=part2 out=out2 (drop=_Smiley Happy;

  by StdPseudoId;

run;

data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);

  set have;

  _name_=cats("CahseeScaledScore04",CahseeTestName04);

run;

proc transpose data=part3 out=out3 (drop=_Smiley Happy;

  by StdPseudoId;

run;

data want;

  set out1;

  set out2;

  set out3;

run;

If you want the variables reordered, you could do that with one additional datastep using a retain statement.

View solution in original post


All Replies
Solution
‎10-17-2012 10:32 PM
PROC Star
Posts: 7,363

Re: transforming long form to wide form; multiple character and numerical variables

There is probably a more direct way but, at least, the following will achieve what you want:

data part1 (keep=StdPseudoId CahseeTestDate04 _name_);

  set have;

  _name_=cats("cahseeTestDate04",CahseeTestName04);

run;

proc transpose data=part1 out=out1 (drop=_Smiley Happy;

  by StdPseudoId;

run;

data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);

  set have;

  _name_=cats("CahseeAttemptQty04",CahseeTestName04);

run;

proc transpose data=part2 out=out2 (drop=_Smiley Happy;

  by StdPseudoId;

run;

data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);

  set have;

  _name_=cats("CahseeScaledScore04",CahseeTestName04);

run;

proc transpose data=part3 out=out3 (drop=_Smiley Happy;

  by StdPseudoId;

run;

data want;

  set out1;

  set out2;

  set out3;

run;

If you want the variables reordered, you could do that with one additional datastep using a retain statement.

Contributor
Posts: 24

Re: transforming long form to wide form; multiple character and numerical variables

Thanks very much for your response. It seems to have worked very well, with the exception of out1 -- which does not contain any observations for some reason. So then the final 'want' file contains no observations, though it has the correct variables. Is it possible this has something to do with CahseeTestDate04 being a character variable, whereas CahseeAttemptQty04 and CahseeScaledScore04 are number variables?

1111  data part1 (keep=StdPseudoId CahseeTestDate04 _name_);

1112    set have;

1113    _name_=cats("CahseeTestDate04",CahseeTestName04);

1114  run;

NOTE: There were 85651 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.PART1 has 85651 observations and 3 variables.

NOTE: Compressing data set WORK.PART1 decreased size by 76.62 percent.

      Compressed is 371 pages; un-compressed would require 1587 pages.

NOTE: DATA statement used (Total process time):

      real time           0.20 seconds

      cpu time            0.18 seconds

1115

1116  proc transpose data=part1 out=out1 (drop=_Smiley Happy;

1117    by StdPseudoId;

1118  run;

NOTE: No variables to transpose.

NOTE: There were 85651 observations read from the data set WORK.PART1.

NOTE: The data set WORK.OUT1 has 0 observations and 3 variables.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           0.60 seconds

      cpu time            0.60 seconds

1119

1120  data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);

1121    set have;

1122    _name_=cats("CahseeAttemptQty04",CahseeTestName04);

1123  run;

NOTE: There were 85651 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.PART2 has 85651 observations and 3 variables.

NOTE: Compressing data set WORK.PART2 decreased size by 78.43 percent.

      Compressed is 330 pages; un-compressed would require 1530 pages.

NOTE: DATA statement used (Total process time):

      real time           0.17 seconds

      cpu time            0.17 seconds

1124

1125  proc transpose data=part2 out=out2 (drop=_Smiley Happy;

1126    by StdPseudoId;

1127  run;

NOTE: There were 85651 observations read from the data set WORK.PART2.

NOTE: The data set WORK.OUT2 has 44431 observations and 3 variables.

NOTE: Compressing data set WORK.OUT2 increased size by 19.62 percent.

      Compressed is 317 pages; un-compressed would require 265 pages.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           0.87 seconds

      cpu time            0.82 seconds

1128

1129  data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);

1130    set have;

1131    _name_=cats("CahseeScaledScore04",CahseeTestName04);

1132  run;

NOTE: There were 85651 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.PART3 has 85651 observations and 3 variables.

NOTE: Compressing data set WORK.PART3 decreased size by 77.19 percent.

      Compressed is 349 pages; un-compressed would require 1530 pages.

NOTE: DATA statement used (Total process time):

      real time           0.23 seconds

      cpu time            0.23 seconds

1133

1134  proc transpose data=part3 out=out3 (drop=_Smiley Happy;

1135    by StdPseudoId;

1136  run;

NOTE: There were 85651 observations read from the data set WORK.PART3.

NOTE: The data set WORK.OUT3 has 44431 observations and 3 variables.

NOTE: Compressing data set WORK.OUT3 increased size by 27.55 percent.

      Compressed is 338 pages; un-compressed would require 265 pages.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           0.78 seconds

      cpu time            0.76 seconds

1137

1138  data want;

1139    set out1;

1140    set out2;

1141    set out3;

1142  run;

NOTE: There were 0 observations read from the data set WORK.OUT1.

NOTE: The data set WORK.WANT has 0 observations and 7 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

PROC Star
Posts: 7,363

Re: transforming long form to wide form; multiple character and numerical variables

Yes, that would indeed create a problem, but an easy one to fix.  Just replace the first transpose with:

proc transpose data=part1 out=out1 (drop=_Smiley Happy;

  var CahseeTestDate04;

  by StdPseudoId;

run;

Contributor
Posts: 24

Re: transforming long form to wide form; multiple character and numerical variables

Thanks very much. That worked perfectly. I certainly appreciate your time. (I'm learning!)

Super User
Posts: 9,681

Re: transforming long form to wide form; multiple character and numerical variables

Suppose  every StdPseudoId    only one  CahseeTestName04  .

data x;
input CahseeTestDate04  & $20.  CahseeAttemptQty04    StdPseudoId    CahseeTestName04 $   CahseeScaledScore04 ;
cards;
11 May 2004                        1                        100014886645            ELA                                    432
16 Mar 2004                        1                        100014886645             MAT                                   385
24 Apr 2004                         2                       200079994422             ELA                                    345
03 Apr 2004                         3                       450029746555             ELA                                    450
05 May 2004                        3                       450029746555             MAT                                   440
26 Mar 2004                         2                       770033338888             MAT                                   367
;
run;
proc sql noprint;
select distinct catt('x(rename=(CahseeTestDate04=CahseeTestDate04',strip(CahseeTestName04),
                            ' CahseeAttemptQty04=CahseeAttemptQty04',strip(CahseeTestName04),
                            ' CahseeScaledScore04=CahseeScaledScore04',strip(CahseeTestName04),
                     ') where=(CahseeTestName04="',CahseeTestName04,'"))' ) into : list separated by ' '
from x;
quit;

data want;
 merge &list ;
 by StdPseudoId;
run;

Ksharp

PROC Star
Posts: 7,363

Re: transforming long form to wide form; multiple character and numerical variables

You had posted but apparently deleted a response that included your log.  While I can no longer check it from what I recall the error you were receiving was the result of having multiple records, for either ELA, MAT or both, that also had missing values for the StdPseudoId field.

The method will fail if any id has multiple records for either ELA or MAT.

Contributor
Posts: 24

Re: transforming long form to wide form; multiple character and numerical variables

thanks, yeah. I had found two observations with missing IDs. When I deleted those it was ok.  However, I have another problem. I just replied to you with that one.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 551 views
  • 3 likes
  • 3 in conversation