turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Data Management
- /
- Forum
- /
- How to count the number of each combination of two...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-15-2018 04:11 PM - edited 06-15-2018 04:15 PM

dataset have:

dataset want:

Variable A and B are two check-all-apply variables in original dataset.

I want to get a frequency table which explicite all combinations of A,B, and C, and count the frequency of each combinations.

Accepted Solutions

Solution

06-15-2018
06:40 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to zhouxysherry

06-15-2018 05:23 PM

Here is one way but may not be extensible easily if you are looking to do this with many more or groups of variables.

data have; input A1-A3 B1-B4 C $; datalines; 1 1 0 0 0 1 1 C1 0 1 1 0 0 1 0 C1 0 0 1 1 0 0 1 C3 ; run; data trans; set have; array as a1-a3; array bs b1-b4; do i= 1 to dim(as); do j= 1 to dim(bs); if as[i]=1 and bs[j]=1 then do; A= vname(as[i]); B= vname(bs[j]); output; end; end; end; keep a b c; run; proc freq data=trans noprint; tables A*B*C/list out=want (drop=Percent); run;

Note that the data is in a data step to test code with.

Also I believe your example output is incorrect as A2 = 1 on rows 1 and 2 and B3=1 on rows 1 and two. So the count for A2 B3 is 2 since the value of C is the same on both rows.

All Replies

Solution

06-15-2018
06:40 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to zhouxysherry

06-15-2018 05:23 PM

Here is one way but may not be extensible easily if you are looking to do this with many more or groups of variables.

data have; input A1-A3 B1-B4 C $; datalines; 1 1 0 0 0 1 1 C1 0 1 1 0 0 1 0 C1 0 0 1 1 0 0 1 C3 ; run; data trans; set have; array as a1-a3; array bs b1-b4; do i= 1 to dim(as); do j= 1 to dim(bs); if as[i]=1 and bs[j]=1 then do; A= vname(as[i]); B= vname(bs[j]); output; end; end; end; keep a b c; run; proc freq data=trans noprint; tables A*B*C/list out=want (drop=Percent); run;

Note that the data is in a data step to test code with.

Also I believe your example output is incorrect as A2 = 1 on rows 1 and 2 and B3=1 on rows 1 and two. So the count for A2 B3 is 2 since the value of C is the same on both rows.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

06-15-2018 06:40 PM

This works perfectly! Much more efficient than PROC TRANSPOSE that I was using.

Really appreciate!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to zhouxysherry

06-15-2018 05:54 PM

```
data have;
input A1-A3 B1-B4 C $;
datalines;
1 1 0 0 0 1 1 C1
0 1 1 0 0 1 0 C1
0 0 1 1 0 0 1 C3
;
run;
data want;
retain A B C;
set have;
array t(*) a1--b4;
do i=1 to dim(t)-1;
A=vname(t(i));
do j=i+1 to dim(t);
B=vname(t(j));
if first(vname(t(i))) ne first(vname(t(j))) then if t(i)+t(j)=2 then do;count=1;output;end;
end;
end;
keep A B C Count;
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to novinosrin

06-15-2018 06:46 PM

Thank you novinosrin! It's amazing that this could be finished in one step!

It works very well.

It works very well.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to zhouxysherry

06-17-2018 11:17 AM

Just for fun, here's a little different version. It sets up a reference file of all of the possible responses, and then matches the input data against it.

Tom

```
/* Set up a couple of formats */
proc format;
value Af
1 = "A1"
2 = "A2"
3 = "A3"
. = "None"
;
value Bf
1 = "B1"
2 = "B2"
3 = "B3"
4 = "B4"
. = "None"
;
run;
/* Create a dataset that contains the desired results for different input patterns */
data Patterns;
format A Af. B Bf.;
keep Pattern A B;
/* Get the A values with no B values */
do APattern = 0 to 7;
BPattern = 0;
Pattern = APattern * 16 + BPattern;
AChar = put(APattern, binary3.);
do ACount = 1 to 3;
call missing(A, B);
if substr(AChar, ACount, 1) = "1" then
A = ACount;
if ^missing(A) then
output;
end;
end;
/* Get the B values with no A values */
APattern = 0;
do BPattern = 0 to 15;
Pattern = APattern * 16 + BPattern;
BChar = put(BPattern, binary4.);
do BCount = 1 to 4;
call missing(A, B);
if substr(BChar, BCount, 1) = "1" then
B = BCount;
if ^missing(B) then
output;
end;
end;
/* Get the combinations of A values with B values */
do APattern = 0 to 7;
do BPattern = 0 to 15;
Pattern = APattern * 16 + BPattern;
AChar = put(APattern, binary3.);
BChar = put(BPattern, binary4.);
do ACount = 1 to 3;
call missing(A);
if substr(AChar, ACount, 1) = "1" then
A = ACount;
do BCount = 1 to 4;
call missing(B);
if substr(BChar, BCount, 1) = "1" then
B = BCount;
if ^missing(A) & ^missing(B) then
output;
end;
end;
end;
end;
run;
/* Get the data */
data Have;
length A1 A2 A3 B1 B2 B3 B4 8 C $2;
input A1 A2 A3 B1 B2 B3 B4 C;
SeqNo = _n_;
cards;
1 1 0 0 0 1 1 C1
0 1 1 0 0 1 0 C1
0 0 1 1 0 0 1 C3
run;
/* Create the pattern to match with */
data Inter01;
set Have;
Pattern = B4*1 + B3*2 + B2*4 + B1*8 + A3*16 + A2*32 + A1*64;
run;
/* Match the input data to the patterns */
proc sql noprint;
create table Want as
select i.A1, i.A2, i.A3, i.B1, i.B2, i.B3, i.B4, p.A, p.B, i.C
from Inter01 i inner join Patterns p on i.Pattern = p.Pattern
order by SeqNo, A, B;
quit;
```