Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-12-2015 10:00 AM
(937 views)

Dear all,

I'm working with 2 tables : one contains all possible combinations of 2 factors among 4:

comp_1 comp_2

A B

A C

A D

B C

B D

C D

and the other contains the statistical units :

fact_A fact_B fact_C fact_D

unit1 0 0 1 1

unit2 1 0 1 1

unit3 0 1 0 1

unit4 1 0 1 0

unit5 0 1 1 1

unit6 1 1 0 0

(unit1 has C and D factors ; unit 5 has B, C and D factors)

I would like to know how many units there are for each combination (mod_1,mod_2).

In this example, the result would be :

comp_1 comp_2 nb_units

A B 1

A C 2

A D 1

B C 1

B D 2

C D 3

I can find easily this result with a loop but my problem is that, actually, there are much more than 4 factors. And thus, the treatment is too long with a loop.

So, I would like to know if my counting could be done with matrix calculation, especially with SAS IML ?

Hoping that my request is clear...

In advance thank you to all those who will try to help me.

Céline

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

First use the ALLCOMB function in SAS/IML to enumerate all pairwise combinations of the columns.

You can then extract each pair and use elementwise multiplication to form a binary vector that has 1 for units that are in common between the factors. Sum up this vector to obtain the results that you want for each pair of columns.

```
proc iml;
colnames = {fact_A fact_B fact_C fact_D};
x = {0 0 1 1,
1 0 1 1,
0 1 0 1,
1 0 1 0,
0 1 1 1,
1 1 0 0};
comb = allcomb(ncol(x), 2);
*print comb;
count = j(nrow(comb), 1);
do i = 1 to nrow(comb);
v = x[ ,comb[i,1]] # x[ ,comb[i,2]];
count[i] = sum(v);
end;
print comb[c={Fact1 Fact2}] count;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Or if you really want a matrix solution with no loops, just form the crossproduct matrix. The results are the sums in the strictly upper-triangular portion of the matrix:

```
results = X`*X;
print results[c=colnames r=colnames];
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your reply.

The crossproduct matrix performs calculation very quickly.

I still have to find a way to exploit easily the strictly upper-triangular portion of the matrix. But thanks to your answers, I have made great progress!

Celine

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You can use the ROW and COL functions to extract the values, and the NDX2SUB to convert the indices to subscripts. This requires SAS/IML 12.3, but the links show how to define the functions in earlier releases. Here's an example,

```
proc iml;
corr = {1.0 0.6 0.5 0.4,
0.6 1.0 0.3 0.2,
0.5 0.3 1.0 0.1,
0.4 0.2 0.1 1.0};
r = row(corr);
c = col(corr);
upperTri = loc(r < c); /* upper tri indices in row major order */
val = corr[upperTri]; /* vector contains n*(n-1)/2 upper triangular corr */
subs = ndx2sub(dimension(corr), upperTri);
print subs[c={"Row" "Col"}] val;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

When you have a ton of factors, it's not clear to me which would be the easier approach. So here's the DATA step approach. Process the second data set only to start:

data want;

set have;

array factors {5} fact_A fact_B fact_C fact_D fact_E;

length comp_1 comp_2 $ 1;

do _J_=1 to 4;

do _K_=J+1 to 5;

if factors{_J_}=factors{_K_}=1 then do;

comp_1 = scan(vname(factors{_J_}), -1, '_');

comp_2 = scan(vname(factors{_K_}, -1, '_');

output;

end;

end;

end;

run;

proc freq data=want;

tables comp_1 * comp_2 / noprint out=want2 (drop=percent);

run;

That gives you a summary data set with counts of all COMP_1 * COMP_2 combinations. It can be merged back into your original data set pretty easily.

in practice, you may need longer lengths for COMP_1 and COMP_2, and you may need to alter the SCAN function to pick out the proper name. And you may need to rename COUNT to NB_UNITS. But the form to the program should remain.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I prefer not to use loops but thank you very much for your help!

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.