BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
huanaca
Calcite | Level 5

Hello,

I would greatly appreciate help transforming my data from long to wide. I have not be able to work out the code. Thank you in advance for your help.

It is a very large data set, but the organization looks like this:

VARIABLES:

CahseeTestDate04 (Character)

CahseeAttemptQty04 (Num)

StdPseudoId (Num)

CahseeTestName04 (Char)

CahseeScaledScore04 (Num)

WHAT I HAVE:

CahseeTestDate04    CahseeAttemptQty04    StdPseudoId    CahseeTestName04    CahseeScaledScore04

11 May 2004                        1                        100014886645            ELA                                    432

16 Mar 2004                        1                        100014886645             MAT                                   385

24 Apr 2004                         2                       200079994422             ELA                                    345

03 Apr 2004                         3                       450029746555             ELA                                    450

05 May 2004                        3                       450029746555             MAT                                   440

26 Mar 2004                         2                       770033338888             MAT                                   367


WHAT I WANT:

StdPseudoIdCahseeTestDate04ELACahseeAttemptQty04ELACahseeScaledScore04ELACahseeTestDate04MATCahseeAttemptQty04MATCahseeScaledScore04MAT
100014886645

11 May 2004

143216 Mar 20041385
20007999442224 Apr 20042345
450029746555

03 Apr 2004

345005 May 20043440
77003333888826 Mar 2004
2367


                                  
                                                                                                           

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

There is probably a more direct way but, at least, the following will achieve what you want:

data part1 (keep=StdPseudoId CahseeTestDate04 _name_);

  set have;

  _name_=cats("cahseeTestDate04",CahseeTestName04);

run;

proc transpose data=part1 out=out1 (drop=_:);

  by StdPseudoId;

run;

data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);

  set have;

  _name_=cats("CahseeAttemptQty04",CahseeTestName04);

run;

proc transpose data=part2 out=out2 (drop=_:);

  by StdPseudoId;

run;

data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);

  set have;

  _name_=cats("CahseeScaledScore04",CahseeTestName04);

run;

proc transpose data=part3 out=out3 (drop=_:);

  by StdPseudoId;

run;

data want;

  set out1;

  set out2;

  set out3;

run;

If you want the variables reordered, you could do that with one additional datastep using a retain statement.

View solution in original post

7 REPLIES 7
art297
Opal | Level 21

There is probably a more direct way but, at least, the following will achieve what you want:

data part1 (keep=StdPseudoId CahseeTestDate04 _name_);

  set have;

  _name_=cats("cahseeTestDate04",CahseeTestName04);

run;

proc transpose data=part1 out=out1 (drop=_:);

  by StdPseudoId;

run;

data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);

  set have;

  _name_=cats("CahseeAttemptQty04",CahseeTestName04);

run;

proc transpose data=part2 out=out2 (drop=_:);

  by StdPseudoId;

run;

data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);

  set have;

  _name_=cats("CahseeScaledScore04",CahseeTestName04);

run;

proc transpose data=part3 out=out3 (drop=_:);

  by StdPseudoId;

run;

data want;

  set out1;

  set out2;

  set out3;

run;

If you want the variables reordered, you could do that with one additional datastep using a retain statement.

huanaca
Calcite | Level 5

Thanks very much for your response. It seems to have worked very well, with the exception of out1 -- which does not contain any observations for some reason. So then the final 'want' file contains no observations, though it has the correct variables. Is it possible this has something to do with CahseeTestDate04 being a character variable, whereas CahseeAttemptQty04 and CahseeScaledScore04 are number variables?

1111  data part1 (keep=StdPseudoId CahseeTestDate04 _name_);

1112    set have;

1113    _name_=cats("CahseeTestDate04",CahseeTestName04);

1114  run;

NOTE: There were 85651 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.PART1 has 85651 observations and 3 variables.

NOTE: Compressing data set WORK.PART1 decreased size by 76.62 percent.

      Compressed is 371 pages; un-compressed would require 1587 pages.

NOTE: DATA statement used (Total process time):

      real time           0.20 seconds

      cpu time            0.18 seconds

1115

1116  proc transpose data=part1 out=out1 (drop=_:);

1117    by StdPseudoId;

1118  run;

NOTE: No variables to transpose.

NOTE: There were 85651 observations read from the data set WORK.PART1.

NOTE: The data set WORK.OUT1 has 0 observations and 3 variables.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           0.60 seconds

      cpu time            0.60 seconds

1119

1120  data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);

1121    set have;

1122    _name_=cats("CahseeAttemptQty04",CahseeTestName04);

1123  run;

NOTE: There were 85651 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.PART2 has 85651 observations and 3 variables.

NOTE: Compressing data set WORK.PART2 decreased size by 78.43 percent.

      Compressed is 330 pages; un-compressed would require 1530 pages.

NOTE: DATA statement used (Total process time):

      real time           0.17 seconds

      cpu time            0.17 seconds

1124

1125  proc transpose data=part2 out=out2 (drop=_:);

1126    by StdPseudoId;

1127  run;

NOTE: There were 85651 observations read from the data set WORK.PART2.

NOTE: The data set WORK.OUT2 has 44431 observations and 3 variables.

NOTE: Compressing data set WORK.OUT2 increased size by 19.62 percent.

      Compressed is 317 pages; un-compressed would require 265 pages.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           0.87 seconds

      cpu time            0.82 seconds

1128

1129  data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);

1130    set have;

1131    _name_=cats("CahseeScaledScore04",CahseeTestName04);

1132  run;

NOTE: There were 85651 observations read from the data set WORK.HAVE.

NOTE: The data set WORK.PART3 has 85651 observations and 3 variables.

NOTE: Compressing data set WORK.PART3 decreased size by 77.19 percent.

      Compressed is 349 pages; un-compressed would require 1530 pages.

NOTE: DATA statement used (Total process time):

      real time           0.23 seconds

      cpu time            0.23 seconds

1133

1134  proc transpose data=part3 out=out3 (drop=_:);

1135    by StdPseudoId;

1136  run;

NOTE: There were 85651 observations read from the data set WORK.PART3.

NOTE: The data set WORK.OUT3 has 44431 observations and 3 variables.

NOTE: Compressing data set WORK.OUT3 increased size by 27.55 percent.

      Compressed is 338 pages; un-compressed would require 265 pages.

NOTE: PROCEDURE TRANSPOSE used (Total process time):

      real time           0.78 seconds

      cpu time            0.76 seconds

1137

1138  data want;

1139    set out1;

1140    set out2;

1141    set out3;

1142  run;

NOTE: There were 0 observations read from the data set WORK.OUT1.

NOTE: The data set WORK.WANT has 0 observations and 7 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

art297
Opal | Level 21

Yes, that would indeed create a problem, but an easy one to fix.  Just replace the first transpose with:

proc transpose data=part1 out=out1 (drop=_:);

  var CahseeTestDate04;

  by StdPseudoId;

run;

huanaca
Calcite | Level 5

Thanks very much. That worked perfectly. I certainly appreciate your time. (I'm learning!)

Ksharp
Super User

Suppose  every StdPseudoId    only one  CahseeTestName04  .

data x;
input CahseeTestDate04  & $20.  CahseeAttemptQty04    StdPseudoId    CahseeTestName04 $   CahseeScaledScore04 ;
cards;
11 May 2004                        1                        100014886645            ELA                                    432
16 Mar 2004                        1                        100014886645             MAT                                   385
24 Apr 2004                         2                       200079994422             ELA                                    345
03 Apr 2004                         3                       450029746555             ELA                                    450
05 May 2004                        3                       450029746555             MAT                                   440
26 Mar 2004                         2                       770033338888             MAT                                   367
;
run;
proc sql noprint;
select distinct catt('x(rename=(CahseeTestDate04=CahseeTestDate04',strip(CahseeTestName04),
                            ' CahseeAttemptQty04=CahseeAttemptQty04',strip(CahseeTestName04),
                            ' CahseeScaledScore04=CahseeScaledScore04',strip(CahseeTestName04),
                     ') where=(CahseeTestName04="',CahseeTestName04,'"))' ) into : list separated by ' '
from x;
quit;

data want;
 merge &list ;
 by StdPseudoId;
run;

Ksharp

art297
Opal | Level 21

You had posted but apparently deleted a response that included your log.  While I can no longer check it from what I recall the error you were receiving was the result of having multiple records, for either ELA, MAT or both, that also had missing values for the StdPseudoId field.

The method will fail if any id has multiple records for either ELA or MAT.

huanaca
Calcite | Level 5

thanks, yeah. I had found two observations with missing IDs. When I deleted those it was ok.  However, I have another problem. I just replied to you with that one.

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1295 views
  • 3 likes
  • 3 in conversation