BookmarkSubscribeRSS Feed
deleted_user
Not applicable
If

1. every ID in A can be found in B

and

2. there is no other ID in B than in A,

shouldn't the following examples give the same result?

data C;
merge A(in=x) B(in=y);
by ID;
if x and y;
run;

data C;
merge A(in=x) B;
by ID;
if x;
run;
6 REPLIES 6
art297
Opal | Level 21
Ernesto,

I would think that the only instances where it might produce different results are:

1. if either file has variables with the name x or y
2. if one or both files have multiple records for some IDs

Art
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello EmestoC,

Your conditions claim that IDs in both datasets are the same. The only possible difference you allow is to have multiple IDs in both datasets. However, this does not change the conclusion. I think the output should be the same in both merges.

Sincerely,
SPR
chang_y_chung_hotmail_com
Obsidian | Level 7
SPR wrote:

> Your conditions claim that IDs in both datasets are

> the same. The only possible difference you allow is

> to have multiple IDs in both datasets. However, this

> does not change the conclusion. I think the output

> should be the same in both merges.



If A is a subset of B and B is a subset of A, then A is equal to B for sure. But OP's question is not about set theory or of logic. And OP's assertion is easy to be shown false with a simple example like so:



   /* a counter example */


   data a;


     id=1output;


   run;


 


   data b;


     id=1; y=0output;


   run;


 


   data c;


     merge a(in=x) b(in=y);


     by id;


     if x and y;


   run;


 


   data d;


     merge a(in=x) b;


     by id;


     if x;


   run;


 


   proc compare base=c compare=d;


   run;


   /* on log -- in part


   WARNING: No matching observations were found.


   NOTE: There were 0 observations read from the data set WORK.C.


   NOTE: There were 1 observations read from the data set WORK.D.


   */

SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Chang_y_chung,

"If A is a subset of B and B is a subset of A, then A is equal to B for sure."

How you arrived to this conclusion? OP said: "1. every ID in A can be found in B" it means that multiple obs with the same ID are possible, so A should NOT ALWAYS be equal B.

As to (in=y). my EG 4.3 produces the following message:

WARNING: The variable y exists on an input data set and is also set by an I/O statement option. The variable will not be included on any output data set and unexpected results may occur.

Sincerely,
SPR
ArtC
Rhodochrosite | Level 12
The data set option IN= creates a temporary variable, unless that variable already exists on the PDV. In that case the warning is fortunately issued. However the results are sometimes surprising.

Here the use of IN=Y forces Y to become a temporary variable and the variable Y from the data set B is dropped.
[pre]
data d;
set a(in=y) b;
run;
[/pre]
deleted_user
Not applicable
ID is a numeric variable.

My point 2 was wrong, I have now found out. There were records in B with missing values( = . ) for ID.

That explains why I got different results by the two examples.

Thank's for your answers.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3802 views
  • 0 likes
  • 5 in conversation