BookmarkSubscribeRSS Feed
Hello_there
Lapis Lazuli | Level 10

Hi,

The log note says "MERGE statement has more than one data set w/ repeats of BY values"

How do i avoid it?

 

data have1; 
infile datalines dsd dlm=",";
	input id $ visitnum var1 var2 var3 var4;
datalines;
001, 0, 56, 78, 39, 94,
001, 1, 54, 73, , ,
001, 1, 94, 95, , ,
001, 2, 94, 29, , ,
002, 0, 57, 73, 23, 11,
002, 1, 51, 75, , ,
002, 1, 90, 91, , ,
002, 2, 89, 30, , ,
;
run;

data have2;
infile datalines dsd dlm=",";
	input id $ visitnum var3 var4;
datalines;
001, 1, 40, 96
002, 1, 24, 14
;
run;

desired output:

Hello_there_0-1674489371244.png

 

15 REPLIES 15
PaigeMiller
Diamond | Level 26

You can avoid it by doing merges that are not one-to-many but are one-to-one, if possible.

 

Otherwise, NOTEs are a good thing, you shouldn't want to avoid them, this could lead to some unexpected and unpleasant outcomes.

--
Paige Miller
Hello_there
Lapis Lazuli | Level 10
I wish i could, but this is necessary for my use case. My logs get checked and unfortunately this would not pass and i would have to redo it. Is there a way to do this in PROC SQL?
Kurt_Bremser
Super User

Whenever you expect a one-to-many MERGE, but get this NOTE, your data is not what you expected, and you either need to get correct data in the first place, or deduplicate the datasets which are supposed to be unique with regards to the BY variable(s).

Tom
Super User Tom
Super User

By not having more than one dataset that has "many".

Your example data does not have any trouble.

340  data want ;
341    merge have1 have2 ;
342    by id visitnum;
343  run;

NOTE: There were 8 observations read from the data set WORK.HAVE1.
NOTE: There were 2 observations read from the data set WORK.HAVE2.
NOTE: The data set WORK.WANT has 8 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Which of the two dataset should be the "one" dataset?  Is it HAVE2?

To test if HAVE2 has duplicates run a test like:

data have2_unique have2_dups;
  set have2 ;
  by id visitnum;
  if first.visitnum then output have2_unique;
  if not (first.visitnum and last.visitnum) then output have2_dups;
run;

 

Hello_there
Lapis Lazuli | Level 10
Hi Tom,
Thanks for taking time to reply. My example data was not good, so that's why the log note didn't populate. I was in a hurry and couldn't think of good enough data to replicate it.

The good news it that i solved my problem by renaming the variables and doing a proc sql left join.
Thanks again!
Tom
Super User Tom
Super User

@Hello_there wrote:
Hi Tom,
Thanks for taking time to reply. My example data was not good, so that's why the log note didn't populate. I was in a hurry and couldn't think of good enough data to replicate it.

The good news it that i solved my problem by renaming the variables and doing a proc sql left join.
Thanks again!

Unless you got rid of the duplicates you didn't solve the underlying problem, you just used a tool that wouldn't complain about it.

Hello_there
Lapis Lazuli | Level 10

The issue is that for my use case, there are multiple values of the by statement bc there were two identical visits and no other unique variable to further make the key variables distinct. It is common for there to be messy data and sometimes it requires a somewhat unrefined solution.

But you could be right that i didn't address the underlying problem which might cause potential issues later on, I have to investigate further and collaborate w/ someone to see if they come to the same output as me.

Hello_there
Lapis Lazuli | Level 10

Isn't many defined as duplicates of by variables?

In my case, isn't my "many" the duplicates of id and visitnum when the visit is equal to 1? In other words, two id=001, visitnum=1 and two id=002, vistnum=1.

Tom
Super User Tom
Super User

@Hello_there wrote:

Isn't many defined as duplicates of by variables?

In my case, isn't my "many" the duplicates of id and visitnum when the visit is equal to 1? In other words, two id=001, visitnum=1 and two id=002, vistnum=1.


No.  Your example is showing two distinct values since the variable ID is different.  It would only be consider MANY if you use only VISITNUM as the key, but that does not make much sense with the data you posted.  If VISITNUM was the key then you should not have the ID variable on both datasets.

 

If both dataset have ID and VISITNUM then those should be the keys.

 

If you want to use only one of them as the key then you should not have the other in both dataset.

Reeza
Super User

Your sample data doesn't generate those notes. Show the code used, especially because you have multiple variables that overlap (id visitnum var3 var4). 

I suspect your actual data has duplicates in each of the files so you have a many to many merge, not a one to many merge. 

 

 69         data want;
 70         merge have1 have2;
 71         by id ;
 72         run;
 
 NOTE: There were 8 observations read from the data set WORK.HAVE1.
 NOTE: There were 2 observations read from the data set WORK.HAVE2.
 NOTE: The data set WORK.WANT has 8 observations and 8 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              1241.06k
       OS Memory           20908.00k
       Timestamp           01/23/2023 04:16:32 PM
       Step Count                        61  Switch Count  2
       Page Faults                       0
       Page Reclaims                     247
       Page Swaps                        0
       Voluntary Context Switches        9
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           264
       
 
 73         
 74         data want;
 75         merge have1 have2;
 76         by id visitnum;
 77         run;
 
 NOTE: There were 8 observations read from the data set WORK.HAVE1.
 NOTE: There were 2 observations read from the data set WORK.HAVE2.
 NOTE: The data set WORK.WANT has 8 observations and 8 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              1251.15k
       OS Memory           20908.00k
       Timestamp           01/23/2023 04:16:32 PM
       Step Count                        62  Switch Count  2
       Page Faults                       0
       Page Reclaims                     184
       Page Swaps                        0
       Voluntary Context Switches        10
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           272
       
Hello_there
Lapis Lazuli | Level 10
Hi Reeza,
Thanks for looking at my issue. My example code was hastily done so I couldn't think of a good way to replicate my issue.
I did however solve my issue by renaming my variables and doing a proc sql left join.
Thanks though for looking at it.
Reeza
Super User
SQL wouldn't have fixed your issue, your join conditions need to be modified.
PaigeMiller
Diamond | Level 26

So, @Hello_there according to the others, these notes are a good things, they are caused by many-to-many merges, which is not what you thought you were doing.

--
Paige Miller
Hello_there
Lapis Lazuli | Level 10
Hi PaigeMiller,

I think my example was not a good one and not reflective of the true nature of my problem.
I did come up with a rather unrefined solution, but i think it achieved the end result of what i wanted.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 2885 views
  • 4 likes
  • 6 in conversation