## Matrix calculation instead of a loop

Dear all,

I'm working with 2 tables : one contains all possible combinations of 2 factors among 4:

comp_1 comp_2
A         B
A         C
A         D
B         C
B         D
C         D

and the other contains the statistical units :

fact_A    fact_B    fact_C   fact_D
unit1          0             0           1            1
unit2          1             0           1            1
unit3          0             1           0            1
unit4          1             0           1            0
unit5          0             1           1            1
unit6          1             1           0            0

(unit1 has C and D factors ; unit 5 has B, C and D factors)

I would like to know how many units there are for each combination (mod_1,mod_2).

In this example, the result would be :

comp_1 comp_2  nb_units
A           B            1
A           C            2
A           D            1
B           C            1
B           D            2
C           D            3

I can find easily this result with a loop but my problem is that, actually, there are much more than 4 factors. And thus, the treatment is too long with a loop.

So, I would like to know if my counting could be done with matrix calculation, especially with SAS IML ?

Hoping that my request is clear...

In advance thank you to all those who will try to help me.

Céline

6 REPLIES 6

## Re: Matrix calculation instead of a loop

First use the ALLCOMB function in SAS/IML to enumerate all pairwise combinations of the columns.

You can then extract each pair and use elementwise multiplication to form a binary vector that has 1 for units that are in common between the factors. Sum up this vector to obtain the results that you want for each pair of columns.

``````proc iml;
colnames = {fact_A    fact_B    fact_C   fact_D};
x = {0             0           1            1,
1             0           1            1,
0             1           0            1,
1             0           1            0,
0             1           1            1,
1             1           0            0};

comb = allcomb(ncol(x), 2);
*print comb;
count = j(nrow(comb), 1);
do i = 1 to nrow(comb);
v = x[ ,comb[i,1]] # x[ ,comb[i,2]];
count[i] = sum(v);
end;

print comb[c={Fact1 Fact2}] count;``````

## Re: Matrix calculation instead of a loop

Or if you really want a matrix solution with no loops, just form the crossproduct matrix.  The results are the sums in the strictly upper-triangular portion of the matrix:

``````
results = X`*X;
print results[c=colnames r=colnames];``````

## Re: Matrix calculation instead of a loop

The crossproduct matrix performs calculation very quickly.

I still have to find a way to exploit easily the strictly upper-triangular portion of the matrix. But thanks to your answers, I have made great progress!

Celine

## Re: Matrix calculation instead of a loop

You can use the ROW and COL functions to extract the values, and the NDX2SUB to convert the indices to subscripts. This requires SAS/IML 12.3, but the links show how to define the functions in earlier releases.  Here's an example,

``````proc iml;
corr = {1.0  0.6  0.5  0.4,
0.6  1.0  0.3  0.2,
0.5  0.3  1.0  0.1,
0.4  0.2  0.1  1.0};
r = row(corr);
c = col(corr);
upperTri = loc(r < c); /* upper tri indices in row major order */
val = corr[upperTri];    /* vector contains n*(n-1)/2 upper triangular corr */
subs = ndx2sub(dimension(corr), upperTri);
print subs[c={"Row" "Col"}] val;``````

## Re: Matrix calculation instead of a loop

When you have a ton of factors, it's not clear to me which would be the easier approach.  So here's the DATA step approach.  Process the second data set only to start:

data want;

set have;

array factors {5} fact_A fact_B fact_C fact_D fact_E;

length comp_1 comp_2 \$ 1;

do _J_=1 to 4;

do _K_=J+1 to 5;

if factors{_J_}=factors{_K_}=1 then do;

comp_1 = scan(vname(factors{_J_}), -1, '_');

comp_2 = scan(vname(factors{_K_}, -1, '_');

output;

end;

end;

end;

run;

proc freq data=want;

tables comp_1 * comp_2 / noprint out=want2 (drop=percent);

run;

That gives you a summary data set with counts of all COMP_1 * COMP_2 combinations.  It can be merged back into your original data set pretty easily.

in practice, you may need longer lengths for COMP_1 and COMP_2, and you may need to alter the SCAN function to pick out the proper name.  And you may need to rename COUNT to NB_UNITS.  But the form to the program should remain.

Good luck.

## Re: Matrix calculation instead of a loop

I prefer not to use loops but thank you very much for your help!
From The DO Loop