BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fengyuwuzu
Pyrite | Level 9

I have a full data set (All), and 3 sub data sets (A, B, C).

How to select the cases that are in All, but not in any of A, or B, or C?

 

I can merge A, B, C into a new dataset and then compare All and the newly merged dataset, but I wonder if there is a simpler way to do it?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

You can include all four datasets in one merge step:

data want;
merge All
      A(in=a)
      B(in=b)
      C(in=c);
by your by_variables;
if a+b+c=0;
run;

View solution in original post

4 REPLIES 4
FreelanceReinh
Jade | Level 19

You can include all four datasets in one merge step:

data want;
merge All
      A(in=a)
      B(in=b)
      C(in=c);
by your by_variables;
if a+b+c=0;
run;
fengyuwuzu
Pyrite | Level 9
what does "if a+b+c=0; " do?
Thanks.
FreelanceReinh
Jade | Level 19

The suggested code assumes that all four datasets have been sorted by the common BY variables (whose names would, of course, replace the text "your by_variables" in the code).

 

a, b, c are only temporary indicator variables which are 1 or 0, depending on whether a combination of BY variable values occurs in the respective dataset. You can choose other names for them (e.g. in_a, in_b, in_c). You should definitely choose other names if variables of the same names were present in any of your four datasets, so that a name conflict would occur otherwise.

 

For three numbers a, b, c which are either 1 or 0, the criterion "a+b+c=0" is equivalent to "a=0 & b=0 & c=0", so you could alternatively write the latter into the IF condition. Another equivalent condition would be: not (a | b | c).

 

In any case, the purpose of this subsetting IF statement is to restrict the output dataset WANT to observations which meet your selection criterion: observations (whose BY value combinations are) not found in A, nor in B, nor in C. The criterion that they are in dataset All would be redundant in this data step, because if an observation does not come from A, B or C, it must come from the only remaining dataset involved: All.

fengyuwuzu
Pyrite | Level 9
Thank you so much for your explanation!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1030 views
  • 3 likes
  • 2 in conversation