BookmarkSubscribeRSS Feed
Calcite | Level 5

Dear all,


I'm working with 2 tables : one contains all possible combinations of 2 factors among 4:


comp_1 comp_2
       A         B
       A         C
       A         D
       B         C
       B         D
       C         D


and the other contains the statistical units :
             fact_A    fact_B    fact_C   fact_D
unit1          0             0           1            1 
unit2          1             0           1            1 
unit3          0             1           0            1 
unit4          1             0           1            0 
unit5          0             1           1            1 
unit6          1             1           0            0   


(unit1 has C and D factors ; unit 5 has B, C and D factors)


I would like to know how many units there are for each combination (mod_1,mod_2).

In this example, the result would be :

comp_1 comp_2  nb_units
       A           B            1
       A           C            2
       A           D            1
       B           C            1
       B           D            2
       C           D            3

I can find easily this result with a loop but my problem is that, actually, there are much more than 4 factors. And thus, the treatment is too long with a loop.

So, I would like to know if my counting could be done with matrix calculation, especially with SAS IML ?


Hoping that my request is clear...


In advance thank you to all those who will try to help me.





First use the ALLCOMB function in SAS/IML to enumerate all pairwise combinations of the columns.

You can then extract each pair and use elementwise multiplication to form a binary vector that has 1 for units that are in common between the factors. Sum up this vector to obtain the results that you want for each pair of columns.



proc iml;
colnames = {fact_A    fact_B    fact_C   fact_D};
x = {0             0           1            1, 
     1             0           1            1, 
     0             1           0            1,
     1             0           1            0, 
     0             1           1            1, 
     1             1           0            0};

comb = allcomb(ncol(x), 2);
*print comb;
count = j(nrow(comb), 1);
do i = 1 to nrow(comb);
   v = x[ ,comb[i,1]] # x[ ,comb[i,2]]; 
   count[i] = sum(v);

print comb[c={Fact1 Fact2}] count;

Or if you really want a matrix solution with no loops, just form the crossproduct matrix.  The results are the sums in the strictly upper-triangular portion of the matrix:


results = X`*X;
print results[c=colnames r=colnames];
Calcite | Level 5

Thank you for your reply.

The crossproduct matrix performs calculation very quickly.


I still have to find a way to exploit easily the strictly upper-triangular portion of the matrix. But thanks to your answers, I have made great progress!





You can use the ROW and COL functions to extract the values, and the NDX2SUB to convert the indices to subscripts. This requires SAS/IML 12.3, but the links show how to define the functions in earlier releases.  Here's an example,


proc iml;
corr = {1.0  0.6  0.5  0.4,
        0.6  1.0  0.3  0.2,
        0.5  0.3  1.0  0.1,
        0.4  0.2  0.1  1.0};
r = row(corr);
c = col(corr);
upperTri = loc(r < c); /* upper tri indices in row major order */
val = corr[upperTri];    /* vector contains n*(n-1)/2 upper triangular corr */
subs = ndx2sub(dimension(corr), upperTri);
print subs[c={"Row" "Col"}] val;
Opal | Level 21

When you have a ton of factors, it's not clear to me which would be the easier approach.  So here's the DATA step approach.  Process the second data set only to start:


data want;

   set have;

   array factors {5} fact_A fact_B fact_C fact_D fact_E;

   length comp_1 comp_2 $ 1;

   do _J_=1 to 4;

      do _K_=J+1 to 5;

         if factors{_J_}=factors{_K_}=1 then do;

            comp_1 = scan(vname(factors{_J_}), -1, '_');

            comp_2 = scan(vname(factors{_K_}, -1, '_');







proc freq data=want;

   tables comp_1 * comp_2 / noprint out=want2 (drop=percent);



That gives you a summary data set with counts of all COMP_1 * COMP_2 combinations.  It can be merged back into your original data set pretty easily.


in practice, you may need longer lengths for COMP_1 and COMP_2, and you may need to alter the SCAN function to pick out the proper name.  And you may need to rename COUNT to NB_UNITS.  But the form to the program should remain.


Good luck.

Calcite | Level 5
I prefer not to use loops but thank you very much for your help!



Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 6 replies
  • 3 in conversation