BookmarkSubscribeRSS Feed
whajjar71
Calcite | Level 5

I am new to the SAS community and appreciate any help you can provide.

I have 9 variables - H1 ... H9.  I need to create new variables with the sums of each possible combination of those original 9.

Does anyone have code to complete that task?  Thanks!

10 REPLIES 10
mkeintz
PROC Star

I take it you want all 2-element sums, 3-elements sums, .... 9-element sum, right?

 

You could do some nested loops in a data step, but I'd suggest using PROC SUMMARY to generate a data set with all the combinations (from 1-way "combination" to 9-way).  Then read that data set and calculate hsum in a single assignment:

 

data have;
  id=1;
  array h {9} (1,2,4,8,16,32,64,128,256);
  output;
  id=2;
  do i=1 to 9;  h{i}=2*h{i};end;
  output;
run;

proc summary data=have (keep=id h1-h9 ) completetypes noprint missing chartype ;
  by id;
  class h1-h9  ;
  output out=need / ways;
  ways 1 to 9;
run;

data want;
  set need (drop=_freq_);
  hsum=sum(of h1-h9);
run;

 

 

Dataset NEED will have, for each ID, 511 observations (=2**9 - 1) with each possible combination of H values.  It will also have variables _WAY_  (1 for 1-way combo, 2 for 2-way combo, etc), and _TYPE_.  _TYPE_ will be a 9-digit strings of 1's and 0's corresponding to which H vars are present or missing.

 

In this particular example HSUM will have every integer value from 1 to 511 for ID=1.   For ID=2, hsum will have every even value from 2 to 1022.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
whajjar71
Calcite | Level 5

Thanks for the response.  One clarification .... I need to execute these summations across all 10,000 observations.

Does that change the code / process having more than one row?

mkeintz
PROC Star

If you take a close look at my example, you'll see that it treats 2 rows, not just one.

 

 

But the requirement of this approach is that you need some identifier variable(s) to uniquely identify each row.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

That will lead to 2^9 obs for one obs. Are sure you want this ?

Ksharp
Super User
data have;
infile cards dlm=',';
input h1-h9;
cards;
1,2,4,8,16,32,64,128,256
;
run;
data want;
 set have;
 array h{*} h1-h9;
 array x{*} x1-x9;
 k=-1;
 do i=1 to 2**dim(x);
  rc=graycode(k,of x{*});
  sum=0;
  do j=1 to dim(x);
    sum+x{j}*h{j};
  end;
  output;
 end;
 keep h1-h9 sum k;
run;
mkeintz
PROC Star

@Ksharp

 

I hadn't heard of graycode before, so I looked it up on wikipedia.  I like the idea of using graycode to step through the combinations, but it would be nicer to avoid looping through all 9 products (h{i}*x{I}) to generate a sum for each graycode iteration.

 

And it seems that ought to be possible.  SAS defines graycode as "generate all combinations of n items in minimal change order", and Wikipedia says graycode's intrinsic property is to change only one member of the combination at a time.  I.e. each step either adds 1 element to the prior combination, or subtracts one.

 

That suggests to take full advantage of graycode one could iteratively update sum instead calculating it from scratch (by either adding or subtracting one H value).  What I don't immediately see is how to best identify the added or removed element, but one could improve efficiency a lot.  Especially for large datasets and long arrays.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
@mkeintz
I agreed with you . That would lead to use SAS/IML . I doubted OP could have product IML.
mkeintz
PROC Star

Here's a relatively simple way to do the task in a data step:

 

data have;
  input id h1-h9;
datalines;
1 256 128  64 32 15 8  4 2 1
2 512 256 128 64 32 16 8 4 2
3 1 2 4  8 16 32  64 128 256
4 2 4 8 16 32 64 128 256 512
run;

%let dim=9;
%let ncombo=%eval(2**&dim);
data want (keep=id i h:);
  if _n_=1 then do;
    %grcode_setup(size=&dim);
  end;
  set have;
  array h{&dim};
  hcount=0;
  hsum=0;

  do i=1 to &ncombo;
    hcount= hcount + _graycode_sign{i};
    hsum =  hsum + h{_graycode_element{i}}*_graycode_sign{i} ;
    output;
  end;
run;

 

 

 The IF "_n_=1" block calls a macro that iterates via a graycode progression through all combinations of &DIM items. But instead of revising an array of dummies (as in the sas  graycode function), it uses the underlying graycode algorithm to build two other  arrays, focused on the element to be added to or removed from the combination:

 (1) _graycode_element{i}, the element to add or removed at the i'th iteration

 (2) _graycode_sign{i}, -1 (remove) or +1 (add) for the i'th iteration

 

Use these arrays later to identify elements to add/remove to maintain a running HSUM

 

Also if you actually do want to maintain an array of dummies, as in the graycode function, just add

    array dum{&dim} (&dim*0);

prior to the do loop.   And  inside the do loop add:

   dum{_graycode_element{i}} = dum{_graycode_element{i}} + _graycode_sign{i};

 

Here's the macro being called:

 

%macro grcode_setup(size=);
  %local size nc;
  %let nc= %eval(2**&size);

  array _graycode_element{&nc} _temporary_ (&nc*1);
  array _graycode_sign{&nc}    _temporary_ (&nc*0);
  do _digit=1 to &size;
    _d=_digit;
    do _i = 2**(_digit-1)+1 to &nc by 2**_digit;
      _graycode_element{_i}=_digit; 
      _graycode_sign{_i}=sign(_d);
      _d=-1*_d;
    end;
  end;
  drop _digit _d _i;
%mend grcode_setup;

 

Notice the macro does not populate the sequence from I=1 to &NC.   First it identifies all the iterations in which the first element of the array changes.  For element 1, the sequence is 0110, meaning it changes every 2nd iteration, starting with iteration 2.  For element 2, the sequence is 00111100 (changing every 4th iteration, starting with iteration 3).  Etc. Etc. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
s_lassen
Meteorite | Level 14

What I don't immediately see is how to best identify the added or removed element, but one could improve efficiency a lot. 

RC marks the spot:

data have;
infile cards dlm=',';
input h1-h9;
cards;
1,2,4,8,16,32,64,128,256
;
run;
data want2;
  set have;
  array h{*} h1-h9;
  array x{*} x1-x9;
  k=-1;
  sum=0;
  rc=graycode(k,of x{*});
  output;
  do i=1 to 2**dim(x)-1;
    rc=graycode(k,of x{*});
    if x{rc} then
      sum=sum+h{rc};
    else
      sum=sum-h{rc};
    output;
    end;
  keep h1-h9 sum k;
run;

 

ballardw
Super User

How about for a single obseration with 4 variables instead of 9 you generate the desired output so we can see what you think you want in a more concrete example.

I'm not sure that some of the comments about numbers of variables involved is sinking in and maybe doing this by hand for a smaller set will demonstrate the concerns raised.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1783 views
  • 1 like
  • 5 in conversation