Solved: Re: Dynamically split and transpose data, joining by ID.

mklangley · Posted 03-14-2022 01:37 PM

Hello,

I am looking for a simpler solution to transform the HAVE data into the format shown in WANT. The data in HAVE needs to be split by type, transposed, then joined by id. In the transposition, the type variable is used as the new columns and the amt variable are the value to transpose. The id variable is the common variable in the join.

The data is ordered by type and id, ascending. There could be any number of types--right now, HAVE contains four ("a" through "d"). In HAVE, there are 5 rows for each type. Note that the id starting number for each type increments. For the first type, id goes from 1 to 5, then for the next type, id goes from 2 to 6, then from 3 to 7, and so on.

The number of rows for each type is always the consistent within HAVE, although it may not always be 5; it could be 10, or 100. Hence the desire for a dynamic (not hard-coded) solution.

One approach is to split HAVE by type into intermediate datasets (i.e., all the type = "a" data, all the type = "b" data, etc.), transpose each of these, then join them all together by id. But I'm wondering if there is a simpler way that doesn't require numerous intermediate datasets and joins. (For the PROC TRANSPOSE pros: Can this all be done with a single PROC TRANSPOSE statement?)

Here is the starting data:

data have;
    input type $ id amt;
    datalines;
a 1 99
a 2 98
a 3 97
a 4 96
a 5 95
b 2 94
b 3 93
b 4 92
b 5 91
b 6 90
c 3 89
c 4 88
c 5 87
c 6 86
c 7 85
d 4 84
d 5 83
d 6 82
d 7 81
d 8 80
    ;
run;

This is the desired output:

data want;
    input id a b c d;
    datalines;
1 99 .  .  .
2 98 94 .  .
3 97 93 89 .
4 96 92 88 84
5 95 91 87 83
6 .  90 86 82
7 .  .  85 81
8 .  .  .  80
    ;
run;

Reeza · Posted 03-14-2022 01:41 PM

A straightforward transpose seems to work correctly here. Is there something in this output that doesn't align with what you're looking to achieve?

proc sort data=have;
by id;
run;

proc transpose data=have out=want;
by id;
id type;
var amt;
run;

@mklangley wrote:

Hello,

I am looking for a simpler solution to transform the HAVE data into the format shown in WANT. The data in HAVE needs to be split by type, transposed, then joined by id. In the transposition, the type variable is used as the new columns and the amt variable are the value to transpose. The id variable is the common variable in the join.

The data is ordered by type and id, ascending. There could be any number of types--right now, HAVE contains four ("a" through "d"). In HAVE, there are 5 rows for each type. Note that the id starting number for each type increments. For the first type, id goes from 1 to 5, then for the next type, id goes from 2 to 6, then from 3 to 7, and so on.

The number of rows for each type is always the consistent within HAVE, although it may not always be 5; it could be 10, or 100. Hence the desire for a dynamic (not hard-coded) solution.

One approach is to split HAVE by type into intermediate datasets (i.e., all the type = "a" data, all the type = "b" data, etc.), transpose each of these, then join them all together by id. But I'm wondering if there is a simpler way that doesn't require numerous intermediate datasets and joins.

Here is the starting data:
data have;
    input type $ id amt;
    datalines;
a 1 99
a 2 98
a 3 97
a 4 96
a 5 95
b 2 94
b 3 93
b 4 92
b 5 91
b 6 90
c 3 89
c 4 88
c 5 87
c 6 86
c 7 85
d 4 84
d 5 83
d 6 82
d 7 81
d 8 80
    ;
run;
This is the desired output:
data want;
    input id a b c d;
    datalines;
1 99 .  .  .
2 98 94 .  .
3 97 93 89 .
4 96 92 88 84
5 95 91 87 83
6 .  90 86 82
7 .  .  85 81
8 .  .  .  80
    ;
run;

View solution in original post

Reeza · Posted 03-14-2022 01:41 PM

A straightforward transpose seems to work correctly here. Is there something in this output that doesn't align with what you're looking to achieve?

proc sort data=have;
by id;
run;

proc transpose data=have out=want;
by id;
id type;
var amt;
run;

@mklangley wrote:

Hello,

I am looking for a simpler solution to transform the HAVE data into the format shown in WANT. The data in HAVE needs to be split by type, transposed, then joined by id. In the transposition, the type variable is used as the new columns and the amt variable are the value to transpose. The id variable is the common variable in the join.

The data is ordered by type and id, ascending. There could be any number of types--right now, HAVE contains four ("a" through "d"). In HAVE, there are 5 rows for each type. Note that the id starting number for each type increments. For the first type, id goes from 1 to 5, then for the next type, id goes from 2 to 6, then from 3 to 7, and so on.

The number of rows for each type is always the consistent within HAVE, although it may not always be 5; it could be 10, or 100. Hence the desire for a dynamic (not hard-coded) solution.

One approach is to split HAVE by type into intermediate datasets (i.e., all the type = "a" data, all the type = "b" data, etc.), transpose each of these, then join them all together by id. But I'm wondering if there is a simpler way that doesn't require numerous intermediate datasets and joins.

Here is the starting data:
data have;
    input type $ id amt;
    datalines;
a 1 99
a 2 98
a 3 97
a 4 96
a 5 95
b 2 94
b 3 93
b 4 92
b 5 91
b 6 90
c 3 89
c 4 88
c 5 87
c 6 86
c 7 85
d 4 84
d 5 83
d 6 82
d 7 81
d 8 80
    ;
run;
This is the desired output:
data want;
    input id a b c d;
    datalines;
1 99 .  .  .
2 98 94 .  .
3 97 93 89 .
4 96 92 88 84
5 95 91 87 83
6 .  90 86 82
7 .  .  85 81
8 .  .  .  80
    ;
run;

mklangley · Posted 03-14-2022 02:02 PM

Thanks, @Reeza -- yes, this simple transpose is working great! It turns out my actual dataset had been sorted incorrectly (not by id), so the transpose was not working properly. I was definitely overthinking this. Thanks for your second set of eyes.

ballardw · Posted 03-14-2022 01:43 PM

I don't know what you think a "split" would be.

For your example Have data set:

proc sort data=have;
   by id type;
run;

proc transpose data=have out=trans (drop=_name_);
   by id;
   id type;
   var amt;
run;

seems to work as desired.

Warning, if you have multiple values of the same Type for any given ID this won't work.

@mklangley wrote:

Hello,

I am looking for a simpler solution to transform the HAVE data into the format shown in WANT. The data in HAVE needs to be split by type, transposed, then joined by id. In the transposition, the type variable is used as the new columns and the amt variable are the value to transpose. The id variable is the common variable in the join.

The data is ordered by type and id, ascending. There could be any number of types--right now, HAVE contains four ("a" through "d"). In HAVE, there are 5 rows for each type. Note that the id starting number for each type increments. For the first type, id goes from 1 to 5, then for the next type, id goes from 2 to 6, then from 3 to 7, and so on.

The number of rows for each type is always the consistent within HAVE, although it may not always be 5; it could be 10, or 100. Hence the desire for a dynamic (not hard-coded) solution.

One approach is to split HAVE by type into intermediate datasets (i.e., all the type = "a" data, all the type = "b" data, etc.), transpose each of these, then join them all together by id. But I'm wondering if there is a simpler way that doesn't require numerous intermediate datasets and joins. (For the PROC TRANSPOSE pros: Can this all be done with a single PROC TRANSPOSE statement?)

Here is the starting data:
data have;
    input type $ id amt;
    datalines;
a 1 99
a 2 98
a 3 97
a 4 96
a 5 95
b 2 94
b 3 93
b 4 92
b 5 91
b 6 90
c 3 89
c 4 88
c 5 87
c 6 86
c 7 85
d 4 84
d 5 83
d 6 82
d 7 81
d 8 80
    ;
run;
This is the desired output:
data want;
    input id a b c d;
    datalines;
1 99 .  .  .
2 98 94 .  .
3 97 93 89 .
4 96 92 88 84
5 95 91 87 83
6 .  90 86 82
7 .  .  85 81
8 .  .  .  80
    ;
run;

mklangley · Posted 03-14-2022 02:03 PM

This works, too. Thank you!

Dynamically split and transpose data, joining by ID.

Re: Dynamically split and transpose data, joining by ID.

Re: Dynamically split and transpose data, joining by ID.

Re: Dynamically split and transpose data, joining by ID.

Re: Dynamically split and transpose data, joining by ID.

Re: Dynamically split and transpose data, joining by ID.

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!