Solved: Re: Calculate averages of every possible combination (only once)

djrisks · Posted 05-19-2017 05:05 PM

Hi,

I will like to create the averages of all possible combinations using the dataset which can be created from the SAS code below please.

data input_data;
  subject = 1; diff = 4.0; output;
  subject = 2; diff = 10.4; output;
  subject = 3; diff = 11.8; output;
  subject = 4; diff = 11.9; output;
  subject = 5; diff = 17.4; output;
  subject = 6; diff = 20.4; output;
  subject = 7; diff = 21.1; output;
  subject = 8; diff = 25.0; output;
  subject = 9; diff = 25.6; output;
  subject = 10; diff = 28.8; output;
  subject = 11; diff = 29.8; output;
run;

I am trying to get my output to be something similar to the image below please. I only want to calculate the averages between every combination once, like in the image below. I have tried doing this by using the cartesian join, however this creates every single average, and I only need 66 of them. I think I may have to use the allcombi function or Proc IML?

Many thanks in advanced for your help.

Kriss

art297 · Posted 05-19-2017 05:37 PM

Here is one way:

data input_data;
  subject = 1; diff = 4.0; output;
  subject = 2; diff = 10.4; output;
  subject = 3; diff = 11.8; output;
  subject = 4; diff = 11.9; output;
  subject = 5; diff = 17.4; output;
  subject = 6; diff = 20.4; output;
  subject = 7; diff = 21.1; output;
  subject = 8; diff = 25.0; output;
  subject = 9; diff = 25.6; output;
  subject = 10; diff = 28.8; output;
  subject = 11; diff = 29.8; output;
run;

proc sql;
  create table want as
   select a.subject,b.subject as subject2,
          mean(a.diff,b.diff) as average
     from input_data a, input_data b
       where a.subject le b.subject
         order by subject,subject2
  ;
quit;

proc transpose data=want 
               out=want (drop=_:) prefix=Subject;
  var average;
  by subject;
  id subject2;
run;

proc print data=want;
run;

Art, CEO, AnalystFinder.com

View solution in original post

art297 · Posted 05-19-2017 05:37 PM

Here is one way:

data input_data;
  subject = 1; diff = 4.0; output;
  subject = 2; diff = 10.4; output;
  subject = 3; diff = 11.8; output;
  subject = 4; diff = 11.9; output;
  subject = 5; diff = 17.4; output;
  subject = 6; diff = 20.4; output;
  subject = 7; diff = 21.1; output;
  subject = 8; diff = 25.0; output;
  subject = 9; diff = 25.6; output;
  subject = 10; diff = 28.8; output;
  subject = 11; diff = 29.8; output;
run;

proc sql;
  create table want as
   select a.subject,b.subject as subject2,
          mean(a.diff,b.diff) as average
     from input_data a, input_data b
       where a.subject le b.subject
         order by subject,subject2
  ;
quit;

proc transpose data=want 
               out=want (drop=_:) prefix=Subject;
  var average;
  by subject;
  id subject2;
run;

proc print data=want;
run;

Art, CEO, AnalystFinder.com

MarkWik · Posted 05-20-2017 07:44 AM

Hi @art297 , Brilliant!, If you don't mind and have the time, Could you please offer a line or two explanation to help me understand the proc sql solution picks all combinations. I so want to understand Please!

art297 · Posted 05-20-2017 08:20 AM

@MarkWik: It's called a Cartesian product. The SAS note (http://support.sas.com/kb/25/270.html) describes the result as "

When joining multiple tables, the default behavior of PROC SQL is to build all possible combinations between the tables. This is also known as the Cartesian Product.

The following note will be written to the SAS log when a Cartesian Product is created:

  
NOTE: The execution of this query involves performing one or
      more Cartesian product joins that can not be optimized.

"

As such, the code I suggested does such a join (including computing the averages) then, according to @djrisks's specs, eliminated the lower half of the diagonal, and ordered the rows and columns by subject number.

Art, CEO, AnalystFinder.com

djrisks · Posted 05-20-2017 08:50 AM

Thank you @art297! This is so helpful! @Ksharp I like the IML solution too, and thank you for helping me with it! @Astounding thank you for the array method!

🙂

Astounding · Posted 05-19-2017 08:40 PM

There are definitely ways to get ALLCOMBI to do this, but I have always programmed it by brute force. This will get you the data, but getting the report is up to you.

data want;

array diffs {11} diff01-diff11;

array inc {11} include01-include11;

do until (done);

set have end=done;

diffs{subject}=diff;

end;

length subjlist $ 11;

do include01=0, 1;

do include02=0, 1;

do include03=0, 1;

do include04=0, 1;

do include05=0, 1;

do include06=0, 1;

do include07=0, 1;

do include08=0, 1;

do include09=0, 1;

do include10=0, 1;

do include11=0, 1;

total=0;

subjlist=' ';

do i=1 to 11;

total + inc{i} * diffs{i};

substr(subjlist, i, 1) = put(inc{i}, 1.);

end;

mean = total / sum(of inc01-inc11);

output;

end; end; end; end; end; end; end; end; end; end; end;

keep diff01-diff11 subjlist mean;

run;

It's untested code, so might need some tweaking. But it should generate all the information you're seeking (plus an extra record ... the empty set with a mean of 0).

With a correction above.

Ksharp · Posted 05-20-2017 01:50 AM

Since @art297 has already given you sql solution.

I want present IML solution.

data input_data;
  subject = 1; diff = 4.0; output;
  subject = 2; diff = 10.4; output;
  subject = 3; diff = 11.8; output;
  subject = 4; diff = 11.9; output;
  subject = 5; diff = 17.4; output;
  subject = 6; diff = 20.4; output;
  subject = 7; diff = 21.1; output;
  subject = 8; diff = 25.0; output;
  subject = 9; diff = 25.6; output;
  subject = 10; diff = 28.8; output;
  subject = 11; diff = 29.8; output;
run;
options missing=' ';
proc iml;
use input_data nobs nobs;
read all var {diff};
close;
t=t(1:nobs);
w=expandgrid(diff,diff)[,:]||expandgrid(t,t);
want=full(w);
idx=loc(row(want)>col(want));
want[idx]=.;

char_diff=char(diff);
mattrib want r=char_diff c=char_diff l='';
print want;
quit;

Registration is open

SAS Training: Just a Click Away