turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-12-2015 10:00 AM

Dear all,

I'm working with 2 tables : one contains all possible combinations of 2 factors among 4:

comp_1 comp_2

A B

A C

A D

B C

B D

C D

and the other contains the statistical units :

fact_A fact_B fact_C fact_D

unit1 0 0 1 1

unit2 1 0 1 1

unit3 0 1 0 1

unit4 1 0 1 0

unit5 0 1 1 1

unit6 1 1 0 0

(unit1 has C and D factors ; unit 5 has B, C and D factors)

I would like to know how many units there are for each combination (mod_1,mod_2).

In this example, the result would be :

comp_1 comp_2 nb_units

A B 1

A C 2

A D 1

B C 1

B D 2

C D 3

I can find easily this result with a loop but my problem is that, actually, there are much more than 4 factors. And thus, the treatment is too long with a loop.

So, I would like to know if my counting could be done with matrix calculation, especially with SAS IML ?

Hoping that my request is clear...

In advance thank you to all those who will try to help me.

Céline

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Celine_G

10-12-2015 10:33 AM

First use the ALLCOMB function in SAS/IML to enumerate all pairwise combinations of the columns.

You can then extract each pair and use elementwise multiplication to form a binary vector that has 1 for units that are in common between the factors. Sum up this vector to obtain the results that you want for each pair of columns.

```
proc iml;
colnames = {fact_A fact_B fact_C fact_D};
x = {0 0 1 1,
1 0 1 1,
0 1 0 1,
1 0 1 0,
0 1 1 1,
1 1 0 0};
comb = allcomb(ncol(x), 2);
*print comb;
count = j(nrow(comb), 1);
do i = 1 to nrow(comb);
v = x[ ,comb[i,1]] # x[ ,comb[i,2]];
count[i] = sum(v);
end;
print comb[c={Fact1 Fact2}] count;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Celine_G

10-12-2015 10:39 AM

Or if you really want a matrix solution with no loops, just form the crossproduct matrix. The results are the sums in the strictly upper-triangular portion of the matrix:

```
results = X`*X;
print results[c=colnames r=colnames];
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

10-13-2015 10:46 AM

Thank you for your reply.

The crossproduct matrix performs calculation very quickly.

I still have to find a way to exploit easily the strictly upper-triangular portion of the matrix. But thanks to your answers, I have made great progress!

Celine

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Celine_G

10-13-2015 02:08 PM

You can use the ROW and COL functions to extract the values, and the NDX2SUB to convert the indices to subscripts. This requires SAS/IML 12.3, but the links show how to define the functions in earlier releases. Here's an example,

```
proc iml;
corr = {1.0 0.6 0.5 0.4,
0.6 1.0 0.3 0.2,
0.5 0.3 1.0 0.1,
0.4 0.2 0.1 1.0};
r = row(corr);
c = col(corr);
upperTri = loc(r < c); /* upper tri indices in row major order */
val = corr[upperTri]; /* vector contains n*(n-1)/2 upper triangular corr */
subs = ndx2sub(dimension(corr), upperTri);
print subs[c={"Row" "Col"}] val;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Celine_G

10-12-2015 10:49 AM

When you have a ton of factors, it's not clear to me which would be the easier approach. So here's the DATA step approach. Process the second data set only to start:

data want;

set have;

array factors {5} fact_A fact_B fact_C fact_D fact_E;

length comp_1 comp_2 $ 1;

do _J_=1 to 4;

do _K_=J+1 to 5;

if factors{_J_}=factors{_K_}=1 then do;

comp_1 = scan(vname(factors{_J_}), -1, '_');

comp_2 = scan(vname(factors{_K_}, -1, '_');

output;

end;

end;

end;

run;

proc freq data=want;

tables comp_1 * comp_2 / noprint out=want2 (drop=percent);

run;

That gives you a summary data set with counts of all COMP_1 * COMP_2 combinations. It can be merged back into your original data set pretty easily.

in practice, you may need longer lengths for COMP_1 and COMP_2, and you may need to alter the SCAN function to pick out the proper name. And you may need to rename COUNT to NB_UNITS. But the form to the program should remain.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

10-13-2015 10:48 AM

I prefer not to use loops but thank you very much for your help!