I have a full data set (All), and 3 sub data sets (A, B, C).
How to select the cases that are in All, but not in any of A, or B, or C?
I can merge A, B, C into a new dataset and then compare All and the newly merged dataset, but I wonder if there is a simpler way to do it?
Thanks
You can include all four datasets in one merge step:
data want;
merge All
A(in=a)
B(in=b)
C(in=c);
by your by_variables;
if a+b+c=0;
run;
You can include all four datasets in one merge step:
data want;
merge All
A(in=a)
B(in=b)
C(in=c);
by your by_variables;
if a+b+c=0;
run;
The suggested code assumes that all four datasets have been sorted by the common BY variables (whose names would, of course, replace the text "your by_variables" in the code).
a, b, c are only temporary indicator variables which are 1 or 0, depending on whether a combination of BY variable values occurs in the respective dataset. You can choose other names for them (e.g. in_a, in_b, in_c). You should definitely choose other names if variables of the same names were present in any of your four datasets, so that a name conflict would occur otherwise.
For three numbers a, b, c which are either 1 or 0, the criterion "a+b+c=0" is equivalent to "a=0 & b=0 & c=0", so you could alternatively write the latter into the IF condition. Another equivalent condition would be: not (a | b | c).
In any case, the purpose of this subsetting IF statement is to restrict the output dataset WANT to observations which meet your selection criterion: observations (whose BY value combinations are) not found in A, nor in B, nor in C. The criterion that they are in dataset All would be redundant in this data step, because if an observation does not come from A, B or C, it must come from the only remaining dataset involved: All.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.