I am new to the SAS community and appreciate any help you can provide.
I have 9 variables - H1 ... H9. I need to create new variables with the sums of each possible combination of those original 9.
Does anyone have code to complete that task? Thanks!
I take it you want all 2-element sums, 3-elements sums, .... 9-element sum, right?
You could do some nested loops in a data step, but I'd suggest using PROC SUMMARY to generate a data set with all the combinations (from 1-way "combination" to 9-way). Then read that data set and calculate hsum in a single assignment:
data have;
id=1;
array h {9} (1,2,4,8,16,32,64,128,256);
output;
id=2;
do i=1 to 9; h{i}=2*h{i};end;
output;
run;
proc summary data=have (keep=id h1-h9 ) completetypes noprint missing chartype ;
by id;
class h1-h9 ;
output out=need / ways;
ways 1 to 9;
run;
data want;
set need (drop=_freq_);
hsum=sum(of h1-h9);
run;
Dataset NEED will have, for each ID, 511 observations (=2**9 - 1) with each possible combination of H values. It will also have variables _WAY_ (1 for 1-way combo, 2 for 2-way combo, etc), and _TYPE_. _TYPE_ will be a 9-digit strings of 1's and 0's corresponding to which H vars are present or missing.
In this particular example HSUM will have every integer value from 1 to 511 for ID=1. For ID=2, hsum will have every even value from 2 to 1022.
Thanks for the response. One clarification .... I need to execute these summations across all 10,000 observations.
Does that change the code / process having more than one row?
If you take a close look at my example, you'll see that it treats 2 rows, not just one.
But the requirement of this approach is that you need some identifier variable(s) to uniquely identify each row.
That will lead to 2^9 obs for one obs. Are sure you want this ?
data have;
infile cards dlm=',';
input h1-h9;
cards;
1,2,4,8,16,32,64,128,256
;
run;
data want;
set have;
array h{*} h1-h9;
array x{*} x1-x9;
k=-1;
do i=1 to 2**dim(x);
rc=graycode(k,of x{*});
sum=0;
do j=1 to dim(x);
sum+x{j}*h{j};
end;
output;
end;
keep h1-h9 sum k;
run;
I hadn't heard of graycode before, so I looked it up on wikipedia. I like the idea of using graycode to step through the combinations, but it would be nicer to avoid looping through all 9 products (h{i}*x{I}) to generate a sum for each graycode iteration.
And it seems that ought to be possible. SAS defines graycode as "generate all combinations of n items in minimal change order", and Wikipedia says graycode's intrinsic property is to change only one member of the combination at a time. I.e. each step either adds 1 element to the prior combination, or subtracts one.
That suggests to take full advantage of graycode one could iteratively update sum instead calculating it from scratch (by either adding or subtracting one H value). What I don't immediately see is how to best identify the added or removed element, but one could improve efficiency a lot. Especially for large datasets and long arrays.
@mkeintz I agreed with you . That would lead to use SAS/IML . I doubted OP could have product IML.
Here's a relatively simple way to do the task in a data step:
data have;
input id h1-h9;
datalines;
1 256 128 64 32 15 8 4 2 1
2 512 256 128 64 32 16 8 4 2
3 1 2 4 8 16 32 64 128 256
4 2 4 8 16 32 64 128 256 512
run;
%let dim=9;
%let ncombo=%eval(2**&dim);
data want (keep=id i h:);
if _n_=1 then do;
%grcode_setup(size=&dim);
end;
set have;
array h{&dim};
hcount=0;
hsum=0;
do i=1 to &ncombo;
hcount= hcount + _graycode_sign{i};
hsum = hsum + h{_graycode_element{i}}*_graycode_sign{i} ;
output;
end;
run;
The IF "_n_=1" block calls a macro that iterates via a graycode progression through all combinations of &DIM items. But instead of revising an array of dummies (as in the sas graycode function), it uses the underlying graycode algorithm to build two other arrays, focused on the element to be added to or removed from the combination:
(1) _graycode_element{i}, the element to add or removed at the i'th iteration
(2) _graycode_sign{i}, -1 (remove) or +1 (add) for the i'th iteration
Use these arrays later to identify elements to add/remove to maintain a running HSUM
Also if you actually do want to maintain an array of dummies, as in the graycode function, just add
array dum{&dim} (&dim*0);
prior to the do loop. And inside the do loop add:
dum{_graycode_element{i}} = dum{_graycode_element{i}} + _graycode_sign{i};
Here's the macro being called:
%macro grcode_setup(size=);
%local size nc;
%let nc= %eval(2**&size);
array _graycode_element{&nc} _temporary_ (&nc*1);
array _graycode_sign{&nc} _temporary_ (&nc*0);
do _digit=1 to &size;
_d=_digit;
do _i = 2**(_digit-1)+1 to &nc by 2**_digit;
_graycode_element{_i}=_digit;
_graycode_sign{_i}=sign(_d);
_d=-1*_d;
end;
end;
drop _digit _d _i;
%mend grcode_setup;
Notice the macro does not populate the sequence from I=1 to &NC. First it identifies all the iterations in which the first element of the array changes. For element 1, the sequence is 0110, meaning it changes every 2nd iteration, starting with iteration 2. For element 2, the sequence is 00111100 (changing every 4th iteration, starting with iteration 3). Etc. Etc.
What I don't immediately see is how to best identify the added or removed element, but one could improve efficiency a lot.
RC marks the spot:
data have; infile cards dlm=','; input h1-h9; cards; 1,2,4,8,16,32,64,128,256 ; run; data want2; set have; array h{*} h1-h9; array x{*} x1-x9; k=-1; sum=0; rc=graycode(k,of x{*}); output; do i=1 to 2**dim(x)-1; rc=graycode(k,of x{*}); if x{rc} then sum=sum+h{rc}; else sum=sum-h{rc}; output; end; keep h1-h9 sum k; run;
How about for a single obseration with 4 variables instead of 9 you generate the desired output so we can see what you think you want in a more concrete example.
I'm not sure that some of the comments about numbers of variables involved is sinking in and maybe doing this by hand for a smaller set will demonstrate the concerns raised.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.