How to avoid this note for merging

Reply
Contributor
Posts: 64

How to avoid this note for merging

Note:Merge statement has more than one data set with repeats of BY values.

I have tried to putting all the common variables from  both data sets  in "BY statement "and merged. Even though I am getting the above note.

Occasional Contributor
Posts: 13

Re: How to avoid this note for merging

Posted in reply to DR_Majeti

Consider using Hash Objects.  You will love it once you get the hang of it.  You will have far more control over merge.

Super User
Posts: 5,516

Re: How to avoid this note for merging

Posted in reply to DR_Majeti

This is a problem that you have to think through before you can write a program.  The situation you are looking at is (almost certainly) a many-to-many merge:

data both;

merge a b;

by ID;

run;

What should happen if the same ID appears twice in data set A, and three times in data set B?  How many observations should the combined data set contain for that ID?  How do you want to match the set of 2 to the set of 3?

Perhaps the answer is to adjust one of the data sets so that it never contains multiple observations for the same ID.  But these are the questions to consider before a program can be written.

Good luck.

PROC Star
Posts: 1,324

Re: How to avoid this note for merging

Posted in reply to DR_Majeti

Hi,

When you merge two datasets, at least one of the datasets should be unique by the BY variables.


MERGE is good for one-to-one merging of datasets, and one-to-many.

Many-to-Many merges are almost always a very bad idea.  A many-to-many merge does NOT produce a cartesian product.  That note is telling you that you are doing a many-to-many merge.

If you want a cartesian product from a many-to-many merge, doing a join in SQL is probably the easiest approach (there are DATA STEP approaches as well, but not merge).

HTH,

--Q.

Contributor
Posts: 64

Re: How to avoid this note for merging

Posted in reply to DR_Majeti

Hi

I have two data sets one is VS and other is adprdtrt.sas7bdat ... two have 7 common variables i have kept all the seven in BY statement.

there are more observations same .. variables common are Subjid usubjid studyid extn arm seq seqc ..

Super User
Posts: 19,864

Re: How to avoid this note for merging

Posted in reply to DR_Majeti

The warning is telling you there's something wrong with your merge, you need to clean the data first and understand the merge.

My suggestion:

If you can find the observations that are causing the warning, merge them and see what the results look like and try and figure out what you want.

You can isolate the obs by running the program first with half of one dataset, if no error, then move on the second half of the dataset.  Then repeat, splitting down until you find the groups that are causing the issue.

PROC Star
Posts: 1,324

Re: How to avoid this note for merging

Posted in reply to DR_Majeti

You are merging

  by SubjID USubjID Extn Arm Seq SeqC

In order to avoid that warning, one of your datasets should be unique by those variables.  Do you expect that to be true?

There are lots of ways to check for duplicates.  One way with a data step is:

data _null_ ;

  set VS ;

  by SubjID USubjID Extn Arm Seq SeqC ;

  if not (first.SeqC and last.SeqC) then put "Duplicate found ! " (SubjID USubjID Extn Arm Seq SeqC) (=) ;

run;

Occasional Contributor
Posts: 16

Re: How to avoid this note for merging

Posted in reply to DR_Majeti

This note will come when you do Many to many merge. no sorting is required. use Proc SQL, it is easy and more convenient. Proc sql; create table final as select a.*, b.x, b.xx, b.xxx, b.xxx From a Full join b on a.id = b.id; quit;

Contributor
Posts: 64

Re: How to avoid this note for merging

Posted in reply to DR_Majeti

Hi all ,

Thanks for the replies..

I try all things which are said by you ..  and I will let you which one worked.. thank you again

Ask a Question
Discussion stats
  • 8 replies
  • 435 views
  • 7 likes
  • 6 in conversation