BookmarkSubscribeRSS Feed
deleted_user
Not applicable
If

1. every ID in A can be found in B

and

2. there is no other ID in B than in A,

shouldn't the following examples give the same result?

data C;
merge A(in=x) B(in=y);
by ID;
if x and y;
run;

data C;
merge A(in=x) B;
by ID;
if x;
run;
6 REPLIES 6
art297
Opal | Level 21
Ernesto,

I would think that the only instances where it might produce different results are:

1. if either file has variables with the name x or y
2. if one or both files have multiple records for some IDs

Art
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello EmestoC,

Your conditions claim that IDs in both datasets are the same. The only possible difference you allow is to have multiple IDs in both datasets. However, this does not change the conclusion. I think the output should be the same in both merges.

Sincerely,
SPR
chang_y_chung_hotmail_com
Obsidian | Level 7
SPR wrote:

> Your conditions claim that IDs in both datasets are

> the same. The only possible difference you allow is

> to have multiple IDs in both datasets. However, this

> does not change the conclusion. I think the output

> should be the same in both merges.



If A is a subset of B and B is a subset of A, then A is equal to B for sure. But OP's question is not about set theory or of logic. And OP's assertion is easy to be shown false with a simple example like so:



   /* a counter example */


   data a;


     id=1output;


   run;


 


   data b;


     id=1; y=0output;


   run;


 


   data c;


     merge a(in=x) b(in=y);


     by id;


     if x and y;


   run;


 


   data d;


     merge a(in=x) b;


     by id;


     if x;


   run;


 


   proc compare base=c compare=d;


   run;


   /* on log -- in part


   WARNING: No matching observations were found.


   NOTE: There were 0 observations read from the data set WORK.C.


   NOTE: There were 1 observations read from the data set WORK.D.


   */

SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Chang_y_chung,

"If A is a subset of B and B is a subset of A, then A is equal to B for sure."

How you arrived to this conclusion? OP said: "1. every ID in A can be found in B" it means that multiple obs with the same ID are possible, so A should NOT ALWAYS be equal B.

As to (in=y). my EG 4.3 produces the following message:

WARNING: The variable y exists on an input data set and is also set by an I/O statement option. The variable will not be included on any output data set and unexpected results may occur.

Sincerely,
SPR
ArtC
Rhodochrosite | Level 12
The data set option IN= creates a temporary variable, unless that variable already exists on the PDV. In that case the warning is fortunately issued. However the results are sometimes surprising.

Here the use of IN=Y forces Y to become a temporary variable and the variable Y from the data set B is dropped.
[pre]
data d;
set a(in=y) b;
run;
[/pre]
deleted_user
Not applicable
ID is a numeric variable.

My point 2 was wrong, I have now found out. There were records in B with missing values( = . ) for ID.

That explains why I got different results by the two examples.

Thank's for your answers.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 3346 views
  • 0 likes
  • 5 in conversation