please I need help merging multiple datsets(5 datasets);
merge dm1(in=a) ds1x(in=b) ex3(in=c) vs1x(in=d) sv2(in=e);
if in (a, b, c, d, e);
When I merge these datasets it generate this message in the log; MERGE statement has more than one data set with repeats of BY values.
when I merged two at a time, I didn't get the message except with sv2;
merge dm1(in=a) ds1x(in=b);
if a and b;
merge vs1x(in=a) ex3(in=b);
if a and b;
The message( MERGE statement has more than one data set with repeats of BY values) occurred in the log again when decide to merge adslx and adsly
merge adslx(in=a) adsly(in=b);
if a and b;
Please, how can I resolve this issue?
Hi it would be great to provide a portion of data, even dummy, for testing.
There was a similar post https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of....
I would also try the syntax :
if A and B and C and D and E;
in lieu of : if in (a, b, c, d, e);
The best solution would likely be to fix the data sets before you merge them. More specifically, why should the same USUBJID have two observations in the same data set? In fact, it has two (or more) observations in more than one incoming data set. That's what the message tells you.
If you can't make USUBJID unique, then you have to choose among more complex scenarios For example, what result do you want if the same USUBJID appears three times in one data set, and twice in another? That's a choice that depends on your knowledge of what is in the data (or what should be in the data). Start with making USUBJID unique if that's at all possible.
I hear you. But SAS disagrees. The message it is giving means that at least two of your incoming data sets contain multiple observations for a USUBJID.
Here's a program you can run to find the source of the problem in the DM1 data set:
proc freq data=dm1 noprint; tables usubjid / out=dm1_multiples (where=(count > 1) drop=percent); run; proc print data=dm1_multiples; run;
Unfortunately, you will need to test all the data sets. Since you are expecting no such problem, it is possible the problem lies with missing values: multiple observations where USUBJID has a missing value.
I'm sorry for replying so late.
DM1 contains 742 observations. When I applied the recommendation above, dm1_multiples gave me 0 obs.
The remaining datasets, sv2_multiples, vs1x_multiples, ex3_multiples, and ds1x_multiples all give 742 obs, which are all unique. This appears to have removed duplicate obs, just like nodupkeys.
I expect to get one obs per record in my final dataset, which is 742 observations.
What do you advise, sir?
Which observation do you want when there are multiples?
The first one?
data first; merge DM1 sv2 vs1x ex3 ds1x ; by usubjid; if first.usubjid ; run;
or the last one?
data first; merge DM1 sv2 vs1x ex3 ds1x ; by usubjid; if last.usubjid ; run;
You may want to run this small example of merging sets with repeats of BY values in two (or more) sets to see what may happen with the values of other variables:
data ex1; input a b; datalines; 1 1 1 2 1 3 1 4 ; data ex2; input a c; datalines; 1 11 1 22 ; data merged; merge ex1 ex2; by a; run;
The "resolution" depends on what you need for a result when multiple sets have multiple values of the BY variables. So you need describe what the resulting data is supposed to have in that situation.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.