Solved: Re: How do I define and count mutually exclusive groups?

acrowther · Posted 07-19-2017 12:00 PM

Hi,

I have a dataset that contains three ID variables: FieldA, Field, FieldC. An individual's ID can appear in one, two, or all three of these fields. Also, the number of observations per variable varies (i.e. there are different numbers of obs in FieldA, FieldB,and Field C). Below is a snapshot of the data (there are thousands of records overal).

FieldA	FieldB	FieldC
100128843	100028331	100028331
100153363	100038471	100038471
100265327	100042435	100042435
100281941	100046405	100046405
100533731	100055129	100055129
100590741	100062169	100062169
100682161	100065289	100065289
100691903	100071483	100071483
100696993	100071671	100071671
100780833	100078509	100078509
100874907	100083405	100083405
100875239	100084351	100084351

I'm trying to determine how many IDs appear in:

Only FieldA

Only FieldB

Only FieldC

Only FieldA + FieldB

Only FieldA + FieldC

Only FieldB + FieldC

Only Field A + FieldB + Field C

I'd like for these groups to be mutually exclusive. Any advice? Thanks!

Astounding · Posted 07-19-2017 12:15 PM

For simplicity, I would go for a multistep approach:

proc freq data=have noprint;

tables FieldA / out=A_counts (keep=fielda);

tables FieldB / out=B_counts (keep=fieldb);

tables FieldC / out=C_counts (keep=fieldc);

run;

data want;

merge a_counts (in=ina rename=(fielda=field))

b_counts (in=inb rename=(fieldb=field))

c_counts (in=inc rename=(fieldc=field));

by field;

length match $ 3;

if ina then match='A';

if inb then substr(match, 2, 1) = 'B';

if inc then substr(match, 3, 1) = 'C';

run;

proc freq data=want;

tables match;

run;

View solution in original post

Astounding · Posted 07-19-2017 12:15 PM

For simplicity, I would go for a multistep approach:

proc freq data=have noprint;

tables FieldA / out=A_counts (keep=fielda);

tables FieldB / out=B_counts (keep=fieldb);

tables FieldC / out=C_counts (keep=fieldc);

run;

data want;

merge a_counts (in=ina rename=(fielda=field))

b_counts (in=inb rename=(fieldb=field))

c_counts (in=inc rename=(fieldc=field));

by field;

length match $ 3;

if ina then match='A';

if inb then substr(match, 2, 1) = 'B';

if inc then substr(match, 3, 1) = 'C';

run;

proc freq data=want;

tables match;

run;

acrowther · Posted 07-19-2017 01:51 PM

That did it - thank you!!!

Ksharp · Posted 07-20-2017 10:05 AM

data have;
input FieldA	FieldB	FieldC;
cards;
100128843	100028331	100028331
100153363	100038471	100038471
100265327	100042435	100042435
100281941	100046405	100046405
100533731	100055129	100055129
100590741	100062169	100062169
100682161	100065289	100065289
100691903	100071483	100071483
100696993	100071671	100071671
100780833	100078509	100078509
100874907	100083405	100083405
100875239	100084351	100084351
;
run;
proc sql;
create table only_in_a as
select FieldA from have
except
select FieldB from have
except
select FieldC from have ;

create table only_in_a_and_b as
select FieldA from have
intersect
select FieldB from have
except
select FieldC from have ;

create table only_in_a_and_b_and_c as
select FieldA from have
intersect
select FieldB from have
intersect
select FieldC from have ;
quit;

How do I define and count mutually exclusive groups?

Re: How do I define and count mutually exclusive groups?

Re: How do I define and count mutually exclusive groups?

Re: How do I define and count mutually exclusive groups?

Re: How do I define and count mutually exclusive groups?

Ready to join fellow brilliant minds for the SAS Hackathon?