02-05-2016 01:02 PM
I have a full data set (All), and 3 sub data sets (A, B, C).
How to select the cases that are in All, but not in any of A, or B, or C?
I can merge A, B, C into a new dataset and then compare All and the newly merged dataset, but I wonder if there is a simpler way to do it?
02-05-2016 01:24 PM - edited 02-05-2016 01:30 PM
The suggested code assumes that all four datasets have been sorted by the common BY variables (whose names would, of course, replace the text "your by_variables" in the code).
a, b, c are only temporary indicator variables which are 1 or 0, depending on whether a combination of BY variable values occurs in the respective dataset. You can choose other names for them (e.g. in_a, in_b, in_c). You should definitely choose other names if variables of the same names were present in any of your four datasets, so that a name conflict would occur otherwise.
For three numbers a, b, c which are either 1 or 0, the criterion "a+b+c=0" is equivalent to "a=0 & b=0 & c=0", so you could alternatively write the latter into the IF condition. Another equivalent condition would be: not (a | b | c).
In any case, the purpose of this subsetting IF statement is to restrict the output dataset WANT to observations which meet your selection criterion: observations (whose BY value combinations are) not found in A, nor in B, nor in C. The criterion that they are in dataset All would be redundant in this data step, because if an observation does not come from A, B or C, it must come from the only remaining dataset involved: All.
Need further help from the community? Please ask a new question.