Hi,
I have a dataset that contains three ID variables: FieldA, Field, FieldC. An individual's ID can appear in one, two, or all three of these fields. Also, the number of observations per variable varies (i.e. there are different numbers of obs in FieldA, FieldB,and Field C). Below is a snapshot of the data (there are thousands of records overal).
FieldA | FieldB | FieldC |
100128843 | 100028331 | 100028331 |
100153363 | 100038471 | 100038471 |
100265327 | 100042435 | 100042435 |
100281941 | 100046405 | 100046405 |
100533731 | 100055129 | 100055129 |
100590741 | 100062169 | 100062169 |
100682161 | 100065289 | 100065289 |
100691903 | 100071483 | 100071483 |
100696993 | 100071671 | 100071671 |
100780833 | 100078509 | 100078509 |
100874907 | 100083405 | 100083405 |
100875239 | 100084351 | 100084351 |
I'm trying to determine how many IDs appear in:
Only FieldA
Only FieldB
Only FieldC
Only FieldA + FieldB
Only FieldA + FieldC
Only FieldB + FieldC
Only Field A + FieldB + Field C
I'd like for these groups to be mutually exclusive. Any advice? Thanks!
For simplicity, I would go for a multistep approach:
proc freq data=have noprint;
tables FieldA / out=A_counts (keep=fielda);
tables FieldB / out=B_counts (keep=fieldb);
tables FieldC / out=C_counts (keep=fieldc);
run;
data want;
merge a_counts (in=ina rename=(fielda=field))
b_counts (in=inb rename=(fieldb=field))
c_counts (in=inc rename=(fieldc=field));
by field;
length match $ 3;
if ina then match='A';
if inb then substr(match, 2, 1) = 'B';
if inc then substr(match, 3, 1) = 'C';
run;
proc freq data=want;
tables match;
run;
For simplicity, I would go for a multistep approach:
proc freq data=have noprint;
tables FieldA / out=A_counts (keep=fielda);
tables FieldB / out=B_counts (keep=fieldb);
tables FieldC / out=C_counts (keep=fieldc);
run;
data want;
merge a_counts (in=ina rename=(fielda=field))
b_counts (in=inb rename=(fieldb=field))
c_counts (in=inc rename=(fieldc=field));
by field;
length match $ 3;
if ina then match='A';
if inb then substr(match, 2, 1) = 'B';
if inc then substr(match, 3, 1) = 'C';
run;
proc freq data=want;
tables match;
run;
That did it - thank you!!!
data have;
input FieldA FieldB FieldC;
cards;
100128843 100028331 100028331
100153363 100038471 100038471
100265327 100042435 100042435
100281941 100046405 100046405
100533731 100055129 100055129
100590741 100062169 100062169
100682161 100065289 100065289
100691903 100071483 100071483
100696993 100071671 100071671
100780833 100078509 100078509
100874907 100083405 100083405
100875239 100084351 100084351
;
run;
proc sql;
create table only_in_a as
select FieldA from have
except
select FieldB from have
except
select FieldC from have ;
create table only_in_a_and_b as
select FieldA from have
intersect
select FieldB from have
except
select FieldC from have ;
create table only_in_a_and_b_and_c as
select FieldA from have
intersect
select FieldB from have
intersect
select FieldC from have ;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.