please I need help merging multiple datsets(5 datasets);
data adsl;
merge dm1(in=a) ds1x(in=b) ex3(in=c) vs1x(in=d) sv2(in=e);
by usubjid;
if in (a, b, c, d, e);
run;
When I merge these datasets it generate this message in the log; MERGE statement has more than one data set with repeats of BY values.
when I merged two at a time, I didn't get the message except with sv2;
data adslx;
merge dm1(in=a) ds1x(in=b);
by usubjid;
if a and b;
run;
data adsly;
merge vs1x(in=a) ex3(in=b);
by usubjid;
if a and b;
run;
The message( MERGE statement has more than one data set with repeats of BY values) occurred in the log again when decide to merge adslx and adsly
data merge_adsl;
merge adslx(in=a) adsly(in=b);
by usubjid;
if a and b;
run;
Please, how can I resolve this issue?
Hi it would be great to provide a portion of data, even dummy, for testing.
There was a similar post https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of....
I would also try the syntax :
if A and B and C and D and E;in lieu of : if in (a, b, c, d, e);
The link was very helpful, I created unique usubjid before merging; My final target is one record per subject.
Thank you;
The best solution would likely be to fix the data sets before you merge them. More specifically, why should the same USUBJID have two observations in the same data set? In fact, it has two (or more) observations in more than one incoming data set. That's what the message tells you.
If you can't make USUBJID unique, then you have to choose among more complex scenarios For example, what result do you want if the same USUBJID appears three times in one data set, and twice in another? That's a choice that depends on your knowledge of what is in the data (or what should be in the data). Start with making USUBJID unique if that's at all possible.
My final target is one record per subject. I made the usubjid unique before merging. It works.
I appreciate.
I hear you. But SAS disagrees. The message it is giving means that at least two of your incoming data sets contain multiple observations for a USUBJID.
Here's a program you can run to find the source of the problem in the DM1 data set:
proc freq data=dm1 noprint;
   tables usubjid / out=dm1_multiples (where=(count > 1) drop=percent);
run;
proc print data=dm1_multiples;
run;Unfortunately, you will need to test all the data sets. Since you are expecting no such problem, it is possible the problem lies with missing values: multiple observations where USUBJID has a missing value.
Hello sir.
I'm sorry for replying so late.
DM1 contains 742 observations. When I applied the recommendation above, dm1_multiples gave me 0 obs.
The remaining datasets, sv2_multiples, vs1x_multiples, ex3_multiples, and ds1x_multiples all give 742 obs, which are all unique. This appears to have removed duplicate obs, just like nodupkeys.
I expect to get one obs per record in my final dataset, which is 742 observations.
What do you advise, sir?
Which observation do you want when there are multiples?
The first one?
data first;
  merge DM1 sv2  vs1x ex3 ds1x ;
  by usubjid;
  if first.usubjid ;
run;or the last one?
data first;
  merge DM1 sv2  vs1x ex3 ds1x ;
  by usubjid;
  if last.usubjid ;
run;You may want to run this small example of merging sets with repeats of BY values in two (or more) sets to see what may happen with the values of other variables:
data ex1; input a b; datalines; 1 1 1 2 1 3 1 4 ; data ex2; input a c; datalines; 1 11 1 22 ; data merged; merge ex1 ex2; by a; run;
The "resolution" depends on what you need for a result when multiple sets have multiple values of the BY variables. So you need describe what the resulting data is supposed to have in that situation.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
