Statistical programming, matrix languages, and more

Matrix calculation instead of a loop

Reply
Occasional Contributor
Posts: 5

Matrix calculation instead of a loop

Dear all,

 

I'm working with 2 tables : one contains all possible combinations of 2 factors among 4:

 

comp_1 comp_2
       A         B
       A         C
       A         D
       B         C
       B         D
       C         D

 

and the other contains the statistical units :
 
             fact_A    fact_B    fact_C   fact_D
unit1          0             0           1            1 
unit2          1             0           1            1 
unit3          0             1           0            1 
unit4          1             0           1            0 
unit5          0             1           1            1 
unit6          1             1           0            0   

 

(unit1 has C and D factors ; unit 5 has B, C and D factors)

 

I would like to know how many units there are for each combination (mod_1,mod_2).

In this example, the result would be :


comp_1 comp_2  nb_units
       A           B            1
       A           C            2
       A           D            1
       B           C            1
       B           D            2
       C           D            3


I can find easily this result with a loop but my problem is that, actually, there are much more than 4 factors. And thus, the treatment is too long with a loop.

So, I would like to know if my counting could be done with matrix calculation, especially with SAS IML ?

 

Hoping that my request is clear...

 

In advance thank you to all those who will try to help me.

 

Céline

 

SAS Super FREQ
Posts: 3,232

Re: Matrix calculation instead of a loop

First use the ALLCOMB function in SAS/IML to enumerate all pairwise combinations of the columns.

You can then extract each pair and use elementwise multiplication to form a binary vector that has 1 for units that are in common between the factors. Sum up this vector to obtain the results that you want for each pair of columns.

 

 

proc iml;
colnames = {fact_A    fact_B    fact_C   fact_D};
x = {0             0           1            1, 
     1             0           1            1, 
     0             1           0            1,
     1             0           1            0, 
     0             1           1            1, 
     1             1           0            0};

comb = allcomb(ncol(x), 2);
*print comb;
count = j(nrow(comb), 1);
do i = 1 to nrow(comb);
   v = x[ ,comb[i,1]] # x[ ,comb[i,2]]; 
   count[i] = sum(v);
end;

print comb[c={Fact1 Fact2}] count;
SAS Super FREQ
Posts: 3,232

Re: Matrix calculation instead of a loop

Or if you really want a matrix solution with no loops, just form the crossproduct matrix.  The results are the sums in the strictly upper-triangular portion of the matrix:

 


results = X`*X;
print results[c=colnames r=colnames];
Occasional Contributor
Posts: 5

Re: Matrix calculation instead of a loop

Thank you for your reply.

The crossproduct matrix performs calculation very quickly.

 

I still have to find a way to exploit easily the strictly upper-triangular portion of the matrix. But thanks to your answers, I have made great progress!

 

Celine

 

SAS Super FREQ
Posts: 3,232

Re: Matrix calculation instead of a loop

You can use the ROW and COL functions to extract the values, and the NDX2SUB to convert the indices to subscripts. This requires SAS/IML 12.3, but the links show how to define the functions in earlier releases.  Here's an example,

 

proc iml;
corr = {1.0  0.6  0.5  0.4,
        0.6  1.0  0.3  0.2,
        0.5  0.3  1.0  0.1,
        0.4  0.2  0.1  1.0};
r = row(corr);
c = col(corr);
upperTri = loc(r < c); /* upper tri indices in row major order */
val = corr[upperTri];    /* vector contains n*(n-1)/2 upper triangular corr */
subs = ndx2sub(dimension(corr), upperTri);
print subs[c={"Row" "Col"}] val;
Respected Advisor
Posts: 4,645

Re: Matrix calculation instead of a loop

When you have a ton of factors, it's not clear to me which would be the easier approach.  So here's the DATA step approach.  Process the second data set only to start:

 

data want;

   set have;

   array factors {5} fact_A fact_B fact_C fact_D fact_E;

   length comp_1 comp_2 $ 1;

   do _J_=1 to 4;

      do _K_=J+1 to 5;

         if factors{_J_}=factors{_K_}=1 then do;

            comp_1 = scan(vname(factors{_J_}), -1, '_');

            comp_2 = scan(vname(factors{_K_}, -1, '_');

            output;

         end;

      end;

   end;

run;

 

proc freq data=want;

   tables comp_1 * comp_2 / noprint out=want2 (drop=percent);

run;

 

That gives you a summary data set with counts of all COMP_1 * COMP_2 combinations.  It can be merged back into your original data set pretty easily.

 

in practice, you may need longer lengths for COMP_1 and COMP_2, and you may need to alter the SCAN function to pick out the proper name.  And you may need to rename COUNT to NB_UNITS.  But the form to the program should remain.

 

Good luck.

Occasional Contributor
Posts: 5

Re: Matrix calculation instead of a loop

I prefer not to use loops but thank you very much for your help!
Post a Question
Discussion Stats
  • 6 replies
  • 399 views
  • 0 likes
  • 3 in conversation