## selecting cases not in subsets

Solved
Super Contributor
Posts: 328

# selecting cases not in subsets

I have a full data set (All), and 3 sub data sets (A, B, C).

How to select the cases that are in All, but not in any of A, or B, or C?

I can merge A, B, C into a new dataset and then compare All and the newly merged dataset, but I wonder if there is a simpler way to do it?

Thanks

Accepted Solutions
Solution
‎02-05-2016 01:46 PM
Posts: 1,242

## Re: selecting cases not in subsets

You can include all four datasets in one merge step:

``````data want;
merge All
A(in=a)
B(in=b)
C(in=c);
if a+b+c=0;
run;``````

All Replies
Solution
‎02-05-2016 01:46 PM
Posts: 1,242

## Re: selecting cases not in subsets

You can include all four datasets in one merge step:

``````data want;
merge All
A(in=a)
B(in=b)
C(in=c);
if a+b+c=0;
run;``````
Super Contributor
Posts: 328

## Re: selecting cases not in subsets

what does "if a+b+c=0; " do?
Thanks.
Posts: 1,242

## Re: selecting cases not in subsets

[ Edited ]

The suggested code assumes that all four datasets have been sorted by the common BY variables (whose names would, of course, replace the text "your by_variables" in the code).

a, b, c are only temporary indicator variables which are 1 or 0, depending on whether a combination of BY variable values occurs in the respective dataset. You can choose other names for them (e.g. in_a, in_b, in_c). You should definitely choose other names if variables of the same names were present in any of your four datasets, so that a name conflict would occur otherwise.

For three numbers a, b, c which are either 1 or 0, the criterion "a+b+c=0" is equivalent to "a=0 & b=0 & c=0", so you could alternatively write the latter into the IF condition. Another equivalent condition would be: not (a | b | c).

In any case, the purpose of this subsetting IF statement is to restrict the output dataset WANT to observations which meet your selection criterion: observations (whose BY value combinations are) not found in A, nor in B, nor in C. The criterion that they are in dataset All would be redundant in this data step, because if an observation does not come from A, B or C, it must come from the only remaining dataset involved: All.

Super Contributor
Posts: 328