Hello,
I would greatly appreciate help transforming my data from long to wide. I have not be able to work out the code. Thank you in advance for your help.
It is a very large data set, but the organization looks like this:
VARIABLES:
CahseeTestDate04 (Character)
CahseeAttemptQty04 (Num)
StdPseudoId (Num)
CahseeTestName04 (Char)
CahseeScaledScore04 (Num)
WHAT I HAVE:
CahseeTestDate04 CahseeAttemptQty04 StdPseudoId CahseeTestName04 CahseeScaledScore04
11 May 2004 1 100014886645 ELA 432
16 Mar 2004 1 100014886645 MAT 385
24 Apr 2004 2 200079994422 ELA 345
03 Apr 2004 3 450029746555 ELA 450
05 May 2004 3 450029746555 MAT 440
26 Mar 2004 2 770033338888 MAT 367
WHAT I WANT:
StdPseudoId | CahseeTestDate04ELA | CahseeAttemptQty04ELA | CahseeScaledScore04ELA | CahseeTestDate04MAT | CahseeAttemptQty04MAT | CahseeScaledScore04MAT |
---|---|---|---|---|---|---|
100014886645 | 11 May 2004 | 1 | 432 | 16 Mar 2004 | 1 | 385 |
200079994422 | 24 Apr 2004 | 2 | 345 | |||
450029746555 | 03 Apr 2004 | 3 | 450 | 05 May 2004 | 3 | 440 |
770033338888 | 26 Mar 2004 | 2 | 367 |
There is probably a more direct way but, at least, the following will achieve what you want:
data part1 (keep=StdPseudoId CahseeTestDate04 _name_);
set have;
_name_=cats("cahseeTestDate04",CahseeTestName04);
run;
proc transpose data=part1 out=out1 (drop=_:);
by StdPseudoId;
run;
data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);
set have;
_name_=cats("CahseeAttemptQty04",CahseeTestName04);
run;
proc transpose data=part2 out=out2 (drop=_:);
by StdPseudoId;
run;
data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);
set have;
_name_=cats("CahseeScaledScore04",CahseeTestName04);
run;
proc transpose data=part3 out=out3 (drop=_:);
by StdPseudoId;
run;
data want;
set out1;
set out2;
set out3;
run;
If you want the variables reordered, you could do that with one additional datastep using a retain statement.
There is probably a more direct way but, at least, the following will achieve what you want:
data part1 (keep=StdPseudoId CahseeTestDate04 _name_);
set have;
_name_=cats("cahseeTestDate04",CahseeTestName04);
run;
proc transpose data=part1 out=out1 (drop=_:);
by StdPseudoId;
run;
data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);
set have;
_name_=cats("CahseeAttemptQty04",CahseeTestName04);
run;
proc transpose data=part2 out=out2 (drop=_:);
by StdPseudoId;
run;
data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);
set have;
_name_=cats("CahseeScaledScore04",CahseeTestName04);
run;
proc transpose data=part3 out=out3 (drop=_:);
by StdPseudoId;
run;
data want;
set out1;
set out2;
set out3;
run;
If you want the variables reordered, you could do that with one additional datastep using a retain statement.
Thanks very much for your response. It seems to have worked very well, with the exception of out1 -- which does not contain any observations for some reason. So then the final 'want' file contains no observations, though it has the correct variables. Is it possible this has something to do with CahseeTestDate04 being a character variable, whereas CahseeAttemptQty04 and CahseeScaledScore04 are number variables?
1111 data part1 (keep=StdPseudoId CahseeTestDate04 _name_);
1112 set have;
1113 _name_=cats("CahseeTestDate04",CahseeTestName04);
1114 run;
NOTE: There were 85651 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.PART1 has 85651 observations and 3 variables.
NOTE: Compressing data set WORK.PART1 decreased size by 76.62 percent.
Compressed is 371 pages; un-compressed would require 1587 pages.
NOTE: DATA statement used (Total process time):
real time 0.20 seconds
cpu time 0.18 seconds
1115
1116 proc transpose data=part1 out=out1 (drop=_:);
1117 by StdPseudoId;
1118 run;
NOTE: No variables to transpose.
NOTE: There were 85651 observations read from the data set WORK.PART1.
NOTE: The data set WORK.OUT1 has 0 observations and 3 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.60 seconds
cpu time 0.60 seconds
1119
1120 data part2 (keep=StdPseudoId CahseeAttemptQty04 _name_);
1121 set have;
1122 _name_=cats("CahseeAttemptQty04",CahseeTestName04);
1123 run;
NOTE: There were 85651 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.PART2 has 85651 observations and 3 variables.
NOTE: Compressing data set WORK.PART2 decreased size by 78.43 percent.
Compressed is 330 pages; un-compressed would require 1530 pages.
NOTE: DATA statement used (Total process time):
real time 0.17 seconds
cpu time 0.17 seconds
1124
1125 proc transpose data=part2 out=out2 (drop=_:);
1126 by StdPseudoId;
1127 run;
NOTE: There were 85651 observations read from the data set WORK.PART2.
NOTE: The data set WORK.OUT2 has 44431 observations and 3 variables.
NOTE: Compressing data set WORK.OUT2 increased size by 19.62 percent.
Compressed is 317 pages; un-compressed would require 265 pages.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.87 seconds
cpu time 0.82 seconds
1128
1129 data part3 (keep=StdPseudoId CahseeScaledScore04 _name_);
1130 set have;
1131 _name_=cats("CahseeScaledScore04",CahseeTestName04);
1132 run;
NOTE: There were 85651 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.PART3 has 85651 observations and 3 variables.
NOTE: Compressing data set WORK.PART3 decreased size by 77.19 percent.
Compressed is 349 pages; un-compressed would require 1530 pages.
NOTE: DATA statement used (Total process time):
real time 0.23 seconds
cpu time 0.23 seconds
1133
1134 proc transpose data=part3 out=out3 (drop=_:);
1135 by StdPseudoId;
1136 run;
NOTE: There were 85651 observations read from the data set WORK.PART3.
NOTE: The data set WORK.OUT3 has 44431 observations and 3 variables.
NOTE: Compressing data set WORK.OUT3 increased size by 27.55 percent.
Compressed is 338 pages; un-compressed would require 265 pages.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 0.78 seconds
cpu time 0.76 seconds
1137
1138 data want;
1139 set out1;
1140 set out2;
1141 set out3;
1142 run;
NOTE: There were 0 observations read from the data set WORK.OUT1.
NOTE: The data set WORK.WANT has 0 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Yes, that would indeed create a problem, but an easy one to fix. Just replace the first transpose with:
proc transpose data=part1 out=out1 (drop=_:);
var CahseeTestDate04;
by StdPseudoId;
run;
Thanks very much. That worked perfectly. I certainly appreciate your time. (I'm learning!)
Suppose every StdPseudoId only one CahseeTestName04 .
data x; input CahseeTestDate04 & $20. CahseeAttemptQty04 StdPseudoId CahseeTestName04 $ CahseeScaledScore04 ; cards; 11 May 2004 1 100014886645 ELA 432 16 Mar 2004 1 100014886645 MAT 385 24 Apr 2004 2 200079994422 ELA 345 03 Apr 2004 3 450029746555 ELA 450 05 May 2004 3 450029746555 MAT 440 26 Mar 2004 2 770033338888 MAT 367 ; run; proc sql noprint; select distinct catt('x(rename=(CahseeTestDate04=CahseeTestDate04',strip(CahseeTestName04), ' CahseeAttemptQty04=CahseeAttemptQty04',strip(CahseeTestName04), ' CahseeScaledScore04=CahseeScaledScore04',strip(CahseeTestName04), ') where=(CahseeTestName04="',CahseeTestName04,'"))' ) into : list separated by ' ' from x; quit; data want; merge &list ; by StdPseudoId; run;
Ksharp
You had posted but apparently deleted a response that included your log. While I can no longer check it from what I recall the error you were receiving was the result of having multiple records, for either ELA, MAT or both, that also had missing values for the StdPseudoId field.
The method will fail if any id has multiple records for either ELA or MAT.
thanks, yeah. I had found two observations with missing IDs. When I deleted those it was ok. However, I have another problem. I just replied to you with that one.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.