DATA Step, Macro, Functions and more

Merging question.

Reply
N/A
Posts: 0

Merging question.

If

1. every ID in A can be found in B

and

2. there is no other ID in B than in A,

shouldn't the following examples give the same result?

data C;
merge A(in=x) B(in=y);
by ID;
if x and y;
run;

data C;
merge A(in=x) B;
by ID;
if x;
run;
PROC Star
Posts: 7,356

Re: Merging question.

Ernesto,

I would think that the only instances where it might produce different results are:

1. if either file has variables with the name x or y
2. if one or both files have multiple records for some IDs

Art
Super Contributor
Super Contributor
Posts: 365

Re: Merging question.

Hello EmestoC,

Your conditions claim that IDs in both datasets are the same. The only possible difference you allow is to have multiple IDs in both datasets. However, this does not change the conclusion. I think the output should be the same in both merges.

Sincerely,
SPR
Regular Contributor
Posts: 241

Re: Merging question.

SPR wrote:

> Your conditions claim that IDs in both datasets are

> the same. The only possible difference you allow is

> to have multiple IDs in both datasets. However, this

> does not change the conclusion. I think the output

> should be the same in both merges.



If A is a subset of B and B is a subset of A, then A is equal to B for sure. But OP's question is not about set theory or of logic. And OP's assertion is easy to be shown false with a simple example like so:



   /* a counter example */


   data a;


     id=1output;


   run;


 


   data b;


     id=1; y=0output;


   run;


 


   data c;


     merge a(in=x) b(in=y);


     by id;


     if x and y;


   run;


 


   data d;


     merge a(in=x) b;


     by id;


     if x;


   run;


 


   proc compare base=c compare=d;


   run;


   /* on log -- in part


   WARNING: No matching observations were found.


   NOTE: There were 0 observations read from the data set WORK.C.


   NOTE: There were 1 observations read from the data set WORK.D.


   */

Super Contributor
Super Contributor
Posts: 365

Re: Merging question.

Hello Chang_y_chung,

"If A is a subset of B and B is a subset of A, then A is equal to B for sure."

How you arrived to this conclusion? OP said: "1. every ID in A can be found in B" it means that multiple obs with the same ID are possible, so A should NOT ALWAYS be equal B.

As to (in=y). my EG 4.3 produces the following message:

WARNING: The variable y exists on an input data set and is also set by an I/O statement option. The variable will not be included on any output data set and unexpected results may occur.

Sincerely,
SPR
Valued Guide
Posts: 632

Re: Merging question.

The data set option IN= creates a temporary variable, unless that variable already exists on the PDV. In that case the warning is fortunately issued. However the results are sometimes surprising.

Here the use of IN=Y forces Y to become a temporary variable and the variable Y from the data set B is dropped.
[pre]
data d;
set a(in=y) b;
run;
[/pre]
N/A
Posts: 0

Re: Merging question.

ID is a numeric variable.

My point 2 was wrong, I have now found out. There were records in B with missing values( = . ) for ID.

That explains why I got different results by the two examples.

Thank's for your answers.
Ask a Question
Discussion stats
  • 6 replies
  • 988 views
  • 0 likes
  • 5 in conversation