Hello,
I am looking for a simpler solution to transform the HAVE data into the format shown in WANT. The data in HAVE needs to be split by type, transposed, then joined by id. In the transposition, the type variable is used as the new columns and the amt variable are the value to transpose. The id variable is the common variable in the join.
The data is ordered by type and id, ascending. There could be any number of types--right now, HAVE contains four ("a" through "d"). In HAVE, there are 5 rows for each type. Note that the id starting number for each type increments. For the first type, id goes from 1 to 5, then for the next type, id goes from 2 to 6, then from 3 to 7, and so on.
The number of rows for each type is always the consistent within HAVE, although it may not always be 5; it could be 10, or 100. Hence the desire for a dynamic (not hard-coded) solution.
One approach is to split HAVE by type into intermediate datasets (i.e., all the type = "a" data, all the type = "b" data, etc.), transpose each of these, then join them all together by id. But I'm wondering if there is a simpler way that doesn't require numerous intermediate datasets and joins. (For the PROC TRANSPOSE pros: Can this all be done with a single PROC TRANSPOSE statement?)
Here is the starting data:
data have;
input type $ id amt;
datalines;
a 1 99
a 2 98
a 3 97
a 4 96
a 5 95
b 2 94
b 3 93
b 4 92
b 5 91
b 6 90
c 3 89
c 4 88
c 5 87
c 6 86
c 7 85
d 4 84
d 5 83
d 6 82
d 7 81
d 8 80
;
run;
This is the desired output:
data want;
input id a b c d;
datalines;
1 99 . . .
2 98 94 . .
3 97 93 89 .
4 96 92 88 84
5 95 91 87 83
6 . 90 86 82
7 . . 85 81
8 . . . 80
;
run;
A straightforward transpose seems to work correctly here. Is there something in this output that doesn't align with what you're looking to achieve?
proc sort data=have;
by id;
run;
proc transpose data=have out=want;
by id;
id type;
var amt;
run;
@mklangley wrote:
Hello,
I am looking for a simpler solution to transform the HAVE data into the format shown in WANT. The data in HAVE needs to be split by type, transposed, then joined by id. In the transposition, the type variable is used as the new columns and the amt variable are the value to transpose. The id variable is the common variable in the join.
The data is ordered by type and id, ascending. There could be any number of types--right now, HAVE contains four ("a" through "d"). In HAVE, there are 5 rows for each type. Note that the id starting number for each type increments. For the first type, id goes from 1 to 5, then for the next type, id goes from 2 to 6, then from 3 to 7, and so on.
The number of rows for each type is always the consistent within HAVE, although it may not always be 5; it could be 10, or 100. Hence the desire for a dynamic (not hard-coded) solution.
One approach is to split HAVE by type into intermediate datasets (i.e., all the type = "a" data, all the type = "b" data, etc.), transpose each of these, then join them all together by id. But I'm wondering if there is a simpler way that doesn't require numerous intermediate datasets and joins.
Here is the starting data:
data have; input type $ id amt; datalines; a 1 99 a 2 98 a 3 97 a 4 96 a 5 95 b 2 94 b 3 93 b 4 92 b 5 91 b 6 90 c 3 89 c 4 88 c 5 87 c 6 86 c 7 85 d 4 84 d 5 83 d 6 82 d 7 81 d 8 80 ; run;
This is the desired output:
data want; input id a b c d; datalines; 1 99 . . . 2 98 94 . . 3 97 93 89 . 4 96 92 88 84 5 95 91 87 83 6 . 90 86 82 7 . . 85 81 8 . . . 80 ; run;
A straightforward transpose seems to work correctly here. Is there something in this output that doesn't align with what you're looking to achieve?
proc sort data=have;
by id;
run;
proc transpose data=have out=want;
by id;
id type;
var amt;
run;
@mklangley wrote:
Hello,
I am looking for a simpler solution to transform the HAVE data into the format shown in WANT. The data in HAVE needs to be split by type, transposed, then joined by id. In the transposition, the type variable is used as the new columns and the amt variable are the value to transpose. The id variable is the common variable in the join.
The data is ordered by type and id, ascending. There could be any number of types--right now, HAVE contains four ("a" through "d"). In HAVE, there are 5 rows for each type. Note that the id starting number for each type increments. For the first type, id goes from 1 to 5, then for the next type, id goes from 2 to 6, then from 3 to 7, and so on.
The number of rows for each type is always the consistent within HAVE, although it may not always be 5; it could be 10, or 100. Hence the desire for a dynamic (not hard-coded) solution.
One approach is to split HAVE by type into intermediate datasets (i.e., all the type = "a" data, all the type = "b" data, etc.), transpose each of these, then join them all together by id. But I'm wondering if there is a simpler way that doesn't require numerous intermediate datasets and joins.
Here is the starting data:
data have; input type $ id amt; datalines; a 1 99 a 2 98 a 3 97 a 4 96 a 5 95 b 2 94 b 3 93 b 4 92 b 5 91 b 6 90 c 3 89 c 4 88 c 5 87 c 6 86 c 7 85 d 4 84 d 5 83 d 6 82 d 7 81 d 8 80 ; run;
This is the desired output:
data want; input id a b c d; datalines; 1 99 . . . 2 98 94 . . 3 97 93 89 . 4 96 92 88 84 5 95 91 87 83 6 . 90 86 82 7 . . 85 81 8 . . . 80 ; run;
Thanks, @Reeza -- yes, this simple transpose is working great! It turns out my actual dataset had been sorted incorrectly (not by id), so the transpose was not working properly. I was definitely overthinking this. Thanks for your second set of eyes.
I don't know what you think a "split" would be.
For your example Have data set:
proc sort data=have; by id type; run; proc transpose data=have out=trans (drop=_name_); by id; id type; var amt; run;
seems to work as desired.
Warning, if you have multiple values of the same Type for any given ID this won't work.
@mklangley wrote:
Hello,
I am looking for a simpler solution to transform the HAVE data into the format shown in WANT. The data in HAVE needs to be split by type, transposed, then joined by id. In the transposition, the type variable is used as the new columns and the amt variable are the value to transpose. The id variable is the common variable in the join.
The data is ordered by type and id, ascending. There could be any number of types--right now, HAVE contains four ("a" through "d"). In HAVE, there are 5 rows for each type. Note that the id starting number for each type increments. For the first type, id goes from 1 to 5, then for the next type, id goes from 2 to 6, then from 3 to 7, and so on.
The number of rows for each type is always the consistent within HAVE, although it may not always be 5; it could be 10, or 100. Hence the desire for a dynamic (not hard-coded) solution.
One approach is to split HAVE by type into intermediate datasets (i.e., all the type = "a" data, all the type = "b" data, etc.), transpose each of these, then join them all together by id. But I'm wondering if there is a simpler way that doesn't require numerous intermediate datasets and joins. (For the PROC TRANSPOSE pros: Can this all be done with a single PROC TRANSPOSE statement?)
Here is the starting data:
data have; input type $ id amt; datalines; a 1 99 a 2 98 a 3 97 a 4 96 a 5 95 b 2 94 b 3 93 b 4 92 b 5 91 b 6 90 c 3 89 c 4 88 c 5 87 c 6 86 c 7 85 d 4 84 d 5 83 d 6 82 d 7 81 d 8 80 ; run;
This is the desired output:
data want; input id a b c d; datalines; 1 99 . . . 2 98 94 . . 3 97 93 89 . 4 96 92 88 84 5 95 91 87 83 6 . 90 86 82 7 . . 85 81 8 . . . 80 ; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.