DATA Step, Macro, Functions and more

Calculate averages of every possible combination (only once)

Accepted Solution Solved
Reply
Super Contributor
Posts: 256
Accepted Solution

Calculate averages of every possible combination (only once)

Hi,

 

I will like to create the averages of all possible combinations using the dataset which can be created from the SAS code below please.

 

data input_data;
  subject = 1; diff = 4.0; output;
  subject = 2; diff = 10.4; output;
  subject = 3; diff = 11.8; output;
  subject = 4; diff = 11.9; output;
  subject = 5; diff = 17.4; output;
  subject = 6; diff = 20.4; output;
  subject = 7; diff = 21.1; output;
  subject = 8; diff = 25.0; output;
  subject = 9; diff = 25.6; output;
  subject = 10; diff = 28.8; output;
  subject = 11; diff = 29.8; output;
run;

I am trying to get my output to be something similar to the image below please. I only want to calculate the averages between every combination once, like in the image below. I have tried doing this by using the cartesian join, however this creates every single average, and I only need 66 of them. I think I may have to use the allcombi function or Proc IML?

 

picture2.PNG

 

Many thanks in advanced for your help.

 

Kriss


Accepted Solutions
Solution
‎05-20-2017 08:48 AM
PROC Star
Posts: 7,468

Re: Calculate averages of every possible combination (only once)

Here is one way:

data input_data;
  subject = 1; diff = 4.0; output;
  subject = 2; diff = 10.4; output;
  subject = 3; diff = 11.8; output;
  subject = 4; diff = 11.9; output;
  subject = 5; diff = 17.4; output;
  subject = 6; diff = 20.4; output;
  subject = 7; diff = 21.1; output;
  subject = 8; diff = 25.0; output;
  subject = 9; diff = 25.6; output;
  subject = 10; diff = 28.8; output;
  subject = 11; diff = 29.8; output;
run;

proc sql;
  create table want as
   select a.subject,b.subject as subject2,
          mean(a.diff,b.diff) as average
     from input_data a, input_data b
       where a.subject le b.subject
         order by subject,subject2
  ;
quit;

proc transpose data=want 
               out=want (drop=_:) prefix=Subject;
  var average;
  by subject;
  id subject2;
run;

proc print data=want;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Solution
‎05-20-2017 08:48 AM
PROC Star
Posts: 7,468

Re: Calculate averages of every possible combination (only once)

Here is one way:

data input_data;
  subject = 1; diff = 4.0; output;
  subject = 2; diff = 10.4; output;
  subject = 3; diff = 11.8; output;
  subject = 4; diff = 11.9; output;
  subject = 5; diff = 17.4; output;
  subject = 6; diff = 20.4; output;
  subject = 7; diff = 21.1; output;
  subject = 8; diff = 25.0; output;
  subject = 9; diff = 25.6; output;
  subject = 10; diff = 28.8; output;
  subject = 11; diff = 29.8; output;
run;

proc sql;
  create table want as
   select a.subject,b.subject as subject2,
          mean(a.diff,b.diff) as average
     from input_data a, input_data b
       where a.subject le b.subject
         order by subject,subject2
  ;
quit;

proc transpose data=want 
               out=want (drop=_:) prefix=Subject;
  var average;
  by subject;
  id subject2;
run;

proc print data=want;
run;

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 75

Re: Calculate averages of every possible combination (only once)

Hi @art297 , Brilliant!, If you don't mind and have the time, Could you please offer a line or two explanation to help me understand the proc sql solution picks all combinations. I so want to understand Please!

PROC Star
Posts: 7,468

Re: Calculate averages of every possible combination (only once)

[ Edited ]

@MarkWik: It's called a Cartesian product. The SAS note (http://support.sas.com/kb/25/270.html) describes the result as "

When joining multiple tables, the default behavior of PROC SQL is to build all possible combinations between the tables. This is also known as the Cartesian Product.

The following note will be written to the SAS log when a Cartesian Product is created:

  
NOTE: The execution of this query involves performing one or
      more Cartesian product joins that can not be optimized. 

"

As such, the code I suggested does such a join (including computing the averages) then, according to @djrisks's specs, eliminated the lower half of the diagonal, and ordered the rows and columns by subject number.

 

Art, CEO, AnalystFinder.com

Super Contributor
Posts: 256

Re: Calculate averages of every possible combination (only once)

[ Edited ]

Thank you @art297! This is so helpful! @Ksharp I like the IML solution too, and thank you for helping me with it! @Astounding thank you for the array method!

 

:-)

Super User
Posts: 5,498

Re: Calculate averages of every possible combination (only once)

[ Edited ]

There are definitely ways to get ALLCOMBI to do this, but I have always programmed it by brute force.  This will get you the data, but getting the report is up to you.

 

data want;

array diffs {11} diff01-diff11;

array inc {11} include01-include11;

do until (done);

   set have end=done;

   diffs{subject}=diff;

end;

length subjlist $ 11;

do include01=0, 1;

do include02=0, 1;

do include03=0, 1;

do include04=0, 1;

do include05=0, 1;

do include06=0, 1;

do include07=0, 1;

do include08=0, 1;

do include09=0, 1;

do include10=0, 1;

do include11=0, 1;

   total=0;

   subjlist=' ';

   do i=1 to 11;

       total + inc{i} * diffs{i};

       substr(subjlist, i, 1) = put(inc{i}, 1.);

   end;

   mean = total / sum(of inc01-inc11);

   output;

end; end; end; end; end; end; end; end; end; end; end;

keep diff01-diff11 subjlist mean;

run;

 

It's untested code, so might need some tweaking.  But it should generate all the information you're seeking (plus an extra record ... the empty set with a mean of 0).

 

With a correction above.

Super User
Posts: 10,020

Re: Calculate averages of every possible combination (only once)

[ Edited ]

Since @art297 has already given you sql solution. 

I want present IML solution.

 

data input_data;
  subject = 1; diff = 4.0; output;
  subject = 2; diff = 10.4; output;
  subject = 3; diff = 11.8; output;
  subject = 4; diff = 11.9; output;
  subject = 5; diff = 17.4; output;
  subject = 6; diff = 20.4; output;
  subject = 7; diff = 21.1; output;
  subject = 8; diff = 25.0; output;
  subject = 9; diff = 25.6; output;
  subject = 10; diff = 28.8; output;
  subject = 11; diff = 29.8; output;
run;
options missing=' ';
proc iml;
use input_data nobs nobs;
read all var {diff};
close;
t=t(1:nobs);
w=expandgrid(diff,diff)[,:]||expandgrid(t,t);
want=full(w);
idx=loc(row(want)>col(want));
want[idx]=.;

char_diff=char(diff);
mattrib want r=char_diff c=char_diff l='';
print want;
quit;

x.png 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 199 views
  • 8 likes
  • 5 in conversation