I have some data that looks the table below, except that my columns for each group (A, B) are over 1,000 and there's thousands of rows.
The operation that I want to perform is summing specific columns (e.g., A1 + B1, A2+B2, A3+B3) to get new columns in the table (C1,C2,C3). I would like to repeat this operation for all columns.
Obs. | A1 | A2 | A3 | A4 | A5 | B1 | B2 | B3 | B4 | B5 |
1 | 206 | 138 | 201 | 240 | 107 | 222 | 115 | 59 | 103 | 200 |
2 | 85 | 217 | 33 | 43 | 138 | 236 | 66 | 169 | 126 | 286 |
3 | 35 | 250 | 255 | 38 | 292 | 74 | 140 | 123 | 23 | 58 |
4 | 245 | 103 | 135 | 121 | 56 | 246 | 82 | 219 | 230 | 127 |
5 | 34 | 136 | 190 | 210 | 167 | 273 | 144 | 135 | 150 | 299 |
6 | 41 | 286 | 58 | 259 | 25 | 167 | 6 | 258 | 20 | 192 |
7 | 295 | 148 | 237 | 184 | 267 | 221 | 128 | 246 | 29 | 62 |
8 | 82 | 78 | 282 | 131 | 165 | 90 | 258 | 166 | 48 | 257 |
9 | 277 | 219 | 139 | 115 | 230 | 147 | 166 | 155 | 219 | 213 |
10 | 67 | 240 | 38 | 255 | 95 | 166 | 286 | 270 | 282 | 20 |
I have tried crafting a small macro (see below), but I'm not getting the output I want as it keeps overwriting my previous column. Can someone help me diagnose my error, or offer a better way to approach this. Thanks
Turn on the MPRINT option and looking at the generated SAS statements will make the mistake much more clear.
You are generating code like:
data WANT;
set HAVE;
C_1 = A_1 + B_1 ;
run;
data WANT;
set HAVE;
C_2 = A_2 + B_2 ;
run;
data WANT;
set HAVE;
C_3 = A_3 + B_3 ;
run;
But you want to generate code like this instead:
data WANT;
set HAVE;
C_1 = A_1 + B_1 ;
C_2 = A_2 + B_2 ;
C_3 = A_3 + B_3 ;
run;
So you just need to change the placement of the macro %DO loop.
But really there is no need for the macro at all. Just use ARRAY statements and you can do it all without any need for code generation.
data WANT;
set HAVE;
array A_[500];
array B_[500];
array C_[500];
do i=1 to 500;
C_[i] = A_[i] + B_[i];
end;
run;
Or restructure the data so that instead of one observation with 500 * 3 variables you just have 500 observations or 3 variables (plus id variables).
data tall;
input id i a b ;
cards;
1 1 206 222
1 2 138 115
1 3 201 59
1 4 240 103
1 5 107 200
2 1 85 236
2 2 217 66
2 3 33 169
2 4 43 126
2 5 138 286
;
Now creating C is simple.
data want;
set tall;
c = a + b;
run;
Result:
If you have, e.g., 1000 columns of each, then:
data want;
set have;
array A {*} A1-A1000;
array B {*} B1-B1000;
array C {*} C1-C1000;
do i=1 to dim(A);
C[i]=A[i]+B[i];
end;
drop i;
run;
Turn on the MPRINT option and looking at the generated SAS statements will make the mistake much more clear.
You are generating code like:
data WANT;
set HAVE;
C_1 = A_1 + B_1 ;
run;
data WANT;
set HAVE;
C_2 = A_2 + B_2 ;
run;
data WANT;
set HAVE;
C_3 = A_3 + B_3 ;
run;
But you want to generate code like this instead:
data WANT;
set HAVE;
C_1 = A_1 + B_1 ;
C_2 = A_2 + B_2 ;
C_3 = A_3 + B_3 ;
run;
So you just need to change the placement of the macro %DO loop.
But really there is no need for the macro at all. Just use ARRAY statements and you can do it all without any need for code generation.
data WANT;
set HAVE;
array A_[500];
array B_[500];
array C_[500];
do i=1 to 500;
C_[i] = A_[i] + B_[i];
end;
run;
Or restructure the data so that instead of one observation with 500 * 3 variables you just have 500 observations or 3 variables (plus id variables).
data tall;
input id i a b ;
cards;
1 1 206 222
1 2 138 115
1 3 201 59
1 4 240 103
1 5 107 200
2 1 85 236
2 2 217 66
2 3 33 169
2 4 43 126
2 5 138 286
;
Now creating C is simple.
data want;
set tall;
c = a + b;
run;
Result:
Appreciate not only the soln's offered, but also the sufficient explanation of what I was doing incorrectly. Cheers.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.