merging many datasets

PrinceAde · Posted 06-20-2023 09:13 AM

please I need help merging multiple datsets(5 datasets);

data adsl;
merge dm1(in=a) ds1x(in=b) ex3(in=c) vs1x(in=d) sv2(in=e);
by usubjid;
if in (a, b, c, d, e);
run;

When I merge these datasets it generate this message in the log; MERGE statement has more than one data set with repeats of BY values.

when I merged two at a time, I didn't get the message except with sv2;

data adslx;
merge dm1(in=a) ds1x(in=b);
by usubjid;
if a and b;
run;

data adsly;
merge vs1x(in=a) ex3(in=b);
by usubjid;
if a and b;
run;

The message( MERGE statement has more than one data set with repeats of BY values) occurred in the log again when decide to merge adslx and adsly

data merge_adsl;
merge adslx(in=a) adsly(in=b);
by usubjid;
if a and b;
run;

Please, how can I resolve this issue?

A_Kh · Posted 06-20-2023 10:29 AM

Hi it would be great to provide a portion of data, even dummy, for testing.
There was a similar post https://communities.sas.com/t5/SAS-Studio/MERGE-statement-has-more-than-one-data-set-with-repeats-of....

I would also try the syntax :

if A and B and C and D and E;

in lieu of : if in (a, b, c, d, e);

PrinceAde · Posted 06-21-2023 06:07 AM

The link was very helpful, I created unique usubjid before merging; My final target is one record per subject.

Thank you;

Astounding · Posted 06-20-2023 10:38 AM

The best solution would likely be to fix the data sets before you merge them. More specifically, why should the same USUBJID have two observations in the same data set? In fact, it has two (or more) observations in more than one incoming data set. That's what the message tells you.

If you can't make USUBJID unique, then you have to choose among more complex scenarios For example, what result do you want if the same USUBJID appears three times in one data set, and twice in another? That's a choice that depends on your knowledge of what is in the data (or what should be in the data). Start with making USUBJID unique if that's at all possible.

PrinceAde · Posted 06-21-2023 05:36 AM

My final target is one record per subject. I made the usubjid unique before merging. It works.

I appreciate.

Astounding · Posted 06-21-2023 07:06 AM

I hear you. But SAS disagrees. The message it is giving means that at least two of your incoming data sets contain multiple observations for a USUBJID.

Here's a program you can run to find the source of the problem in the DM1 data set:

proc freq data=dm1 noprint;
   tables usubjid / out=dm1_multiples (where=(count > 1) drop=percent);
run;
proc print data=dm1_multiples;
run;

Unfortunately, you will need to test all the data sets. Since you are expecting no such problem, it is possible the problem lies with missing values: multiple observations where USUBJID has a missing value.

PrinceAde · Posted 06-26-2023 10:10 AM

Hello sir.

I'm sorry for replying so late.

DM1 contains 742 observations. When I applied the recommendation above, dm1_multiples gave me 0 obs.

The remaining datasets, sv2_multiples, vs1x_multiples, ex3_multiples, and ds1x_multiples all give 742 obs, which are all unique. This appears to have removed duplicate obs, just like nodupkeys.

I expect to get one obs per record in my final dataset, which is 742 observations.

What do you advise, sir?

Tom · Posted 06-26-2023 01:59 PM

Which observation do you want when there are multiples?

The first one?

data first;
  merge DM1 sv2  vs1x ex3 ds1x ;
  by usubjid;
  if first.usubjid ;
run;

or the last one?

data first;
  merge DM1 sv2  vs1x ex3 ds1x ;
  by usubjid;
  if last.usubjid ;
run;

ballardw · Posted 06-20-2023 01:23 PM

You may want to run this small example of merging sets with repeats of BY values in two (or more) sets to see what may happen with the values of other variables:

data ex1;
   input a b;
datalines;
1 1
1 2
1 3
1 4
;

data ex2;
   input a c;
datalines;
1 11
1 22
;

data merged;
   merge ex1 ex2;
   by a;
run;

The "resolution" depends on what you need for a result when multiple sets have multiple values of the BY variables. So you need describe what the resulting data is supposed to have in that situation.

merging many datasets

Re: merging many datasets

Re: merging many datasets

Re: merging many datasets

Re: merging many datasets

Re: merging many datasets

Re: merging many datasets

Re: merging many datasets

Re: merging many datasets

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!