BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Shmuel
Garnet | Level 18

@Theo_Gh, in your log there is a message:

19   data data3;
20       merge theo.to_be_merged(in=a) theo.security_daily(in=b);
21       by gvkey;
22       if b;
23   run;
 
NOTE: MERGE statement has more than one data set with repeats of BY values.

that means, your datasets are in relation N:M per gvkey.

 

Run next test code and compare results of merge vs sql - 

data tst_nok vs tst_ok:

/* merge M:n vs SQL */
data tst1;
  key = 1; varn = 5; varx='A'; output;
  key = 1; varn = 6; varx='B'; output;
  key = 1; varn = 7; varx='C'; output;
run;

data tst2;
  key = 1; varm = 8; vary='X'; output;
  key = 1; varm = 9; vary='Y'; output;
run;

data tst_nok;
merge tst1 tst2;
  by key;
run;

proc sql;
  create table tst_ok as 
  select a.*, b.varm, b.vary
  from tst1 as a
  left join tst2 as b 
  on a.key = b.key
  order by key,varn;
quit;

I recommend use sql to join data from both datasets or

use @mkeintz code if it fits your needs.

Theo_Gh
Obsidian | Level 7
Thank you very much. Will try it.
Theo_Gh
Obsidian | Level 7

When I use sql, I do not get " MERGE statement has more than one data set with repeats of BY values" . But will it work when one data set is annual and the other is daily ?

 

Thank you.

Kurt_Bremser
Super User

You do not get this message because SQL creates a cartesian product in many-to-many situations.

See this code

data have1;
input id value1;
cards;
1 1
1 2
;
run;

data have2;
input id value2;
cards;
1 3
1 4
1 5
;
run;

proc sql;
create table want as select
a.id, a.value1, b.value2
from have1 a inner join have2 b
on a.id = b.id
;
quit;

proc print data=want noobs;
run;

The result is

id    value1    value2

 1       1         3  
 1       2         3  
 1       1         4  
 1       2         4  
 1       1         5  
 1       2         5  

Now a data step works differently:

data want2;
merge
  have1 (in=a)
  have2 (in=b)
;
by id;
if a and b;
run;

proc print data=want2 noobs;
run;

You get the NOTE: MERGE statement has more than one data set with repeats of BY values.
And this is the result:

id    value1    value2

 1       1         3  
 1       2         4  
 1       2         5  
Theo_Gh
Obsidian | Level 7
Does one have advantage over the other? Do they give significantly different research outcomes?
Sorry to you all for the back and forth; I'm new to SAS and research in general
Kurt_Bremser
Super User

@Theo_Gh wrote:
Does one have advantage over the other? Do they give significantly different research outcomes?
Sorry to you all for the back and forth; I'm new to SAS and research in general

The difference IS siginificant. Just look at the outputs of my example.

 

In most cases, when you have a many-to-many relationship, you will want to go with SQL and join every instance of A with every instance of B, thereby getting A*B records(observations).

 

If you don't want that type of expansion, it is usually best to summarize one (or more) of the datasets so you get 1 observation per key, and then do the join. Or make a conscious decision which one of the obervations per key is interesting for you at a given moment and filter for that.

 

Basically, everytime I get the NOTE about more than one datasets having repeating by values, I know I either have unexpected data or my code is faulty.

 

 

Theo_Gh
Obsidian | Level 7
How do you summarize dataset(s) to get one observation per key?
Kurt_Bremser
Super User

It depends on the contents. Do you need an overall sum, or an average? Do you want to keep an earliest or last timestamp/date, or both? Or do you have character values where one takes precedence over others, or which you might want to concatenate into a string? Or would a proc transpose make sense?

This has to be determined by the logic behind the data. Once that has been done, the code can be written.

Theo_Gh
Obsidian | Level 7

I want to thank everybody for your help. I just wanted codes but your answers to my questions meant I had to look at my data itself; I have learnt so much here. Thanks, everybody!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 24 replies
  • 7737 views
  • 2 likes
  • 5 in conversation