BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Shmuel
Garnet | Level 18

@Theo_Gh, in your log there is a message:

19   data data3;
20       merge theo.to_be_merged(in=a) theo.security_daily(in=b);
21       by gvkey;
22       if b;
23   run;
 
NOTE: MERGE statement has more than one data set with repeats of BY values.

that means, your datasets are in relation N:M per gvkey.

 

Run next test code and compare results of merge vs sql - 

data tst_nok vs tst_ok:

/* merge M:n vs SQL */
data tst1;
  key = 1; varn = 5; varx='A'; output;
  key = 1; varn = 6; varx='B'; output;
  key = 1; varn = 7; varx='C'; output;
run;

data tst2;
  key = 1; varm = 8; vary='X'; output;
  key = 1; varm = 9; vary='Y'; output;
run;

data tst_nok;
merge tst1 tst2;
  by key;
run;

proc sql;
  create table tst_ok as 
  select a.*, b.varm, b.vary
  from tst1 as a
  left join tst2 as b 
  on a.key = b.key
  order by key,varn;
quit;

I recommend use sql to join data from both datasets or

use @mkeintz code if it fits your needs.

Theo_Gh
Obsidian | Level 7
Thank you very much. Will try it.
Theo_Gh
Obsidian | Level 7

When I use sql, I do not get " MERGE statement has more than one data set with repeats of BY values" . But will it work when one data set is annual and the other is daily ?

 

Thank you.

Kurt_Bremser
Super User

You do not get this message because SQL creates a cartesian product in many-to-many situations.

See this code

data have1;
input id value1;
cards;
1 1
1 2
;
run;

data have2;
input id value2;
cards;
1 3
1 4
1 5
;
run;

proc sql;
create table want as select
a.id, a.value1, b.value2
from have1 a inner join have2 b
on a.id = b.id
;
quit;

proc print data=want noobs;
run;

The result is

id    value1    value2

 1       1         3  
 1       2         3  
 1       1         4  
 1       2         4  
 1       1         5  
 1       2         5  

Now a data step works differently:

data want2;
merge
  have1 (in=a)
  have2 (in=b)
;
by id;
if a and b;
run;

proc print data=want2 noobs;
run;

You get the NOTE: MERGE statement has more than one data set with repeats of BY values.
And this is the result:

id    value1    value2

 1       1         3  
 1       2         4  
 1       2         5  
Theo_Gh
Obsidian | Level 7
Does one have advantage over the other? Do they give significantly different research outcomes?
Sorry to you all for the back and forth; I'm new to SAS and research in general
Kurt_Bremser
Super User

@Theo_Gh wrote:
Does one have advantage over the other? Do they give significantly different research outcomes?
Sorry to you all for the back and forth; I'm new to SAS and research in general

The difference IS siginificant. Just look at the outputs of my example.

 

In most cases, when you have a many-to-many relationship, you will want to go with SQL and join every instance of A with every instance of B, thereby getting A*B records(observations).

 

If you don't want that type of expansion, it is usually best to summarize one (or more) of the datasets so you get 1 observation per key, and then do the join. Or make a conscious decision which one of the obervations per key is interesting for you at a given moment and filter for that.

 

Basically, everytime I get the NOTE about more than one datasets having repeating by values, I know I either have unexpected data or my code is faulty.

 

 

Theo_Gh
Obsidian | Level 7
How do you summarize dataset(s) to get one observation per key?
Kurt_Bremser
Super User

It depends on the contents. Do you need an overall sum, or an average? Do you want to keep an earliest or last timestamp/date, or both? Or do you have character values where one takes precedence over others, or which you might want to concatenate into a string? Or would a proc transpose make sense?

This has to be determined by the logic behind the data. Once that has been done, the code can be written.

Theo_Gh
Obsidian | Level 7

I want to thank everybody for your help. I just wanted codes but your answers to my questions meant I had to look at my data itself; I have learnt so much here. Thanks, everybody!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 24 replies
  • 7429 views
  • 2 likes
  • 5 in conversation