Creating Sum Variables For All Combinations Of 9 Variables

Reply
New Contributor
Posts: 2

Creating Sum Variables For All Combinations Of 9 Variables

I am new to the SAS community and appreciate any help you can provide.

I have 9 variables - H1 ... H9.  I need to create new variables with the sums of each possible combination of those original 9.

Does anyone have code to complete that task?  Thanks!

Trusted Advisor
Posts: 1,312

Re: Creating Sum Variables For All Combinations Of 9 Variables

Posted in reply to whajjar71

I take it you want all 2-element sums, 3-elements sums, .... 9-element sum, right?

 

You could do some nested loops in a data step, but I'd suggest using PROC SUMMARY to generate a data set with all the combinations (from 1-way "combination" to 9-way).  Then read that data set and calculate hsum in a single assignment:

 

data have;
  id=1;
  array h {9} (1,2,4,8,16,32,64,128,256);
  output;
  id=2;
  do i=1 to 9;  h{i}=2*h{i};end;
  output;
run;

proc summary data=have (keep=id h1-h9 ) completetypes noprint missing chartype ;
  by id;
  class h1-h9  ;
  output out=need / ways;
  ways 1 to 9;
run;

data want;
  set need (drop=_freq_);
  hsum=sum(of h1-h9);
run;

 

 

Dataset NEED will have, for each ID, 511 observations (=2**9 - 1) with each possible combination of H values.  It will also have variables _WAY_  (1 for 1-way combo, 2 for 2-way combo, etc), and _TYPE_.  _TYPE_ will be a 9-digit strings of 1's and 0's corresponding to which H vars are present or missing.

 

In this particular example HSUM will have every integer value from 1 to 511 for ID=1.   For ID=2, hsum will have every even value from 2 to 1022.

New Contributor
Posts: 2

Re: Creating Sum Variables For All Combinations Of 9 Variables

Thanks for the response.  One clarification .... I need to execute these summations across all 10,000 observations.

Does that change the code / process having more than one row?

Trusted Advisor
Posts: 1,312

Re: Creating Sum Variables For All Combinations Of 9 Variables

Posted in reply to whajjar71

If you take a close look at my example, you'll see that it treats 2 rows, not just one.

 

 

But the requirement of this approach is that you need some identifier variable(s) to uniquely identify each row.

Super User
Posts: 10,695

Re: Creating Sum Variables For All Combinations Of 9 Variables

Posted in reply to whajjar71

That will lead to 2^9 obs for one obs. Are sure you want this ?

Super User
Posts: 10,695

Re: Creating Sum Variables For All Combinations Of 9 Variables

Posted in reply to whajjar71
data have;
infile cards dlm=',';
input h1-h9;
cards;
1,2,4,8,16,32,64,128,256
;
run;
data want;
 set have;
 array h{*} h1-h9;
 array x{*} x1-x9;
 k=-1;
 do i=1 to 2**dim(x);
  rc=graycode(k,of x{*});
  sum=0;
  do j=1 to dim(x);
    sum+x{j}*h{j};
  end;
  output;
 end;
 keep h1-h9 sum k;
run;
Trusted Advisor
Posts: 1,312

Re: Creating Sum Variables For All Combinations Of 9 Variables

@Ksharp

 

I hadn't heard of graycode before, so I looked it up on wikipedia.  I like the idea of using graycode to step through the combinations, but it would be nicer to avoid looping through all 9 products (h{i}*x{I}) to generate a sum for each graycode iteration.

 

And it seems that ought to be possible.  SAS defines graycode as "generate all combinations of n items in minimal change order", and Wikipedia says graycode's intrinsic property is to change only one member of the combination at a time.  I.e. each step either adds 1 element to the prior combination, or subtracts one.

 

That suggests to take full advantage of graycode one could iteratively update sum instead calculating it from scratch (by either adding or subtracting one H value).  What I don't immediately see is how to best identify the added or removed element, but one could improve efficiency a lot.  Especially for large datasets and long arrays.

Super User
Posts: 10,695

Re: Creating Sum Variables For All Combinations Of 9 Variables

@mkeintz
I agreed with you . That would lead to use SAS/IML . I doubted OP could have product IML.
Trusted Advisor
Posts: 1,312

Re: Creating Sum Variables For All Combinations Of 9 Variables

Here's a relatively simple way to do the task in a data step:

 

data have;
  input id h1-h9;
datalines;
1 256 128  64 32 15 8  4 2 1
2 512 256 128 64 32 16 8 4 2
3 1 2 4  8 16 32  64 128 256
4 2 4 8 16 32 64 128 256 512
run;

%let dim=9;
%let ncombo=%eval(2**&dim);
data want (keep=id i h:);
  if _n_=1 then do;
    %grcode_setup(size=&dim);
  end;
  set have;
  array h{&dim};
  hcount=0;
  hsum=0;

  do i=1 to &ncombo;
    hcount= hcount + _graycode_sign{i};
    hsum =  hsum + h{_graycode_element{i}}*_graycode_sign{i} ;
    output;
  end;
run;

 

 

 The IF "_n_=1" block calls a macro that iterates via a graycode progression through all combinations of &DIM items. But instead of revising an array of dummies (as in the sas  graycode function), it uses the underlying graycode algorithm to build two other  arrays, focused on the element to be added to or removed from the combination:

 (1) _graycode_element{i}, the element to add or removed at the i'th iteration

 (2) _graycode_sign{i}, -1 (remove) or +1 (add) for the i'th iteration

 

Use these arrays later to identify elements to add/remove to maintain a running HSUM

 

Also if you actually do want to maintain an array of dummies, as in the graycode function, just add

    array dum{&dim} (&dim*0);

prior to the do loop.   And  inside the do loop add:

   dum{_graycode_element{i}} = dum{_graycode_element{i}} + _graycode_sign{i};

 

Here's the macro being called:

 

%macro grcode_setup(size=);
  %local size nc;
  %let nc= %eval(2**&size);

  array _graycode_element{&nc} _temporary_ (&nc*1);
  array _graycode_sign{&nc}    _temporary_ (&nc*0);
  do _digit=1 to &size;
    _d=_digit;
    do _i = 2**(_digit-1)+1 to &nc by 2**_digit;
      _graycode_element{_i}=_digit; 
      _graycode_sign{_i}=sign(_d);
      _d=-1*_d;
    end;
  end;
  drop _digit _d _i;
%mend grcode_setup;

 

Notice the macro does not populate the sequence from I=1 to &NC.   First it identifies all the iterations in which the first element of the array changes.  For element 1, the sequence is 0110, meaning it changes every 2nd iteration, starting with iteration 2.  For element 2, the sequence is 00111100 (changing every 4th iteration, starting with iteration 3).  Etc. Etc. 

PROC Star
Posts: 254

Re: Creating Sum Variables For All Combinations Of 9 Variables

What I don't immediately see is how to best identify the added or removed element, but one could improve efficiency a lot. 

RC marks the spot:

data have;
infile cards dlm=',';
input h1-h9;
cards;
1,2,4,8,16,32,64,128,256
;
run;
data want2;
  set have;
  array h{*} h1-h9;
  array x{*} x1-x9;
  k=-1;
  sum=0;
  rc=graycode(k,of x{*});
  output;
  do i=1 to 2**dim(x)-1;
    rc=graycode(k,of x{*});
    if x{rc} then
      sum=sum+h{rc};
    else
      sum=sum-h{rc};
    output;
    end;
  keep h1-h9 sum k;
run;

 

Super User
Posts: 13,350

Re: Creating Sum Variables For All Combinations Of 9 Variables

Posted in reply to whajjar71

How about for a single obseration with 4 variables instead of 9 you generate the desired output so we can see what you think you want in a more concrete example.

I'm not sure that some of the comments about numbers of variables involved is sinking in and maybe doing this by hand for a smaller set will demonstrate the concerns raised.

Ask a Question
Discussion stats
  • 10 replies
  • 521 views
  • 1 like
  • 5 in conversation