DATA Step, Macro, Functions and more

Many to Many Merge

Reply
Super Contributor
Posts: 1,041

Many to Many Merge


Hi Team,

I am doing a many to many merge and end up with the wrong results. Is there any way we can get it right without using the PROC SQL!!!

Regards

Super Contributor
Posts: 1,636

Re: Many to Many Merge

Posted in reply to robertrao

post some sample data.

Super Contributor
Posts: 1,041

Re: Many to Many Merge

While merging i used IF A   !!!

For each ID the symbol variable spans 35 times in the first dataset. BUT the disease code for that patient ranges from 0-35 meaning he might not have a code or has 35 of the codes.

I am merging this with dataset B which has ID common But has more than 35 count for ID(may be a thousand 101's) because of the splitting of the dollar amount

Also in the Dataset B the dollar amount is broken down for each patient

He might also have less than 35 records

Dataset A

ID          symbol        code

101       1                    abc

101        2                   def

101         3

101         4

..

101         35   

102

102

102

Dataset B

ID         total       broken dollar amt(to satisfy 500 in all 101's)

101      500            20

101     500             15

101     500            100

101      500           so on which totals 500

101      500

101     500

101    500

101    500

101   500

101   500

IF HE HAS ONLY 11 Records in the B dataset he is getting the dollar amount on the 11th record retained 35 times because of the first dataset having 35 observations for each patient

When there are 11 in the Dataset B I want only those 11 and not 35

Regards

Super Contributor
Posts: 543

Re: Many to Many Merge

Posted in reply to robertrao

Hi.

You cannot do a many-to-many merge via data steps if you only have one common key variable (in your case, ID) because SAS does not know how to "match" the say 11 records from B to the 35 records in A?

So you need at least another "key" variable that will help you identify uniquely each record in your data set (s).

For example, it would be very helpful if the dollar amt breakdown would have a code associated with it. Then you could sort both data sets by ID Code, then merge them by ID Code.

PROC SQL would work very well on your problem.:smileycool:

Respected Advisor
Posts: 3,156

Re: Many to Many Merge

Posted in reply to AncaTilea

I would categorize Hash() as part of Datastep implementation, and with Hash() and enough memory, I can hardly see anything that Proc SQL can do while data step can't.

Just my 2 cents,

Haikuo

Super Contributor
Posts: 1,041

Re: Many to Many Merge

Hi,

Thanks for the detailed information. I would go through the thread and get back with you if I have any more questions.

Regards

Respected Advisor
Posts: 3,156

Re: Many to Many Merge

Posted in reply to robertrao

As long as you have large enough RAM to host one of two merging datasets, Hash() will be among most efficient:

data h1;

  do id=1 to 3;

  do var1=1 to 2;

output;

end;

  end;

  run;

  data h2;

  do id=1 to 3;

do var2=3 to 5;

output;

end;

end;

run;

data want;

  if _n_=1 then do;

  if 0 then set h2;

declare hash h2(dataset:'h2', multidata:'y');

h2.definekey('id');

h2.definedata(all:'y');

h2.definedone();

  end;

  set h1;

  rc=h2.find();

do rc=0 by 0 while (rc=0);

output;

rc=h2.find_next();

end;

drop rc;

run;

proc print;run;

Haikuo

Respected Advisor
Posts: 3,156

Re: Many to Many Merge

Posted in reply to robertrao

if I am thinking what you are thinking, you probably need something from this thread: (Pay attention to Ksharp's solution):

Haikuo

Super Contributor
Posts: 1,041

Re: Many to Many Merge

Hi,

Does this NOte means we are doing something wrong???

NOTE: MERGE statement has more than one data set with repeats of BY values????

Respected Advisor
Posts: 3,156

Re: Many to Many Merge

Posted in reply to robertrao

I won't worry about it if you do have data sets like that. It basically says that both of your data sets have by variables that are duplicated. 

Haikuo

Super Contributor
Posts: 1,041

Re: Many to Many Merge

ok

I will put it this way.

I said if A and A dataset has 5000 observations.

I did a merge of A and B and final dataset has 5000 observations?? and I get that NOTE

Do I need to worry abt it??

Regards

Respected Advisor
Posts: 3,156

Re: Many to Many Merge

Posted in reply to robertrao

Karun,

At this stage, I think it is better for you to lay out what exact outcome you are expecting, preferably with  sample tables( both incoming and outcoming).  Then we can start from there.

Haikuo

Super Contributor
Posts: 543

Re: Many to Many Merge

Posted in reply to robertrao

The NOTE means that SAS does not know how to "match" the records in the two data sets.

One should never ignore that note and instead perform a correct merge.

Respected Advisor
Posts: 3,156

Re: Many to Many Merge

Posted in reply to AncaTilea

Hi AncaTilea,

I have to disagree. The NOTE is merely a reminder: " Wait a sec, are you sure this is what you want?", SAS does KNOW how to proceed under this scenario, and it does it with decent consistency. The only thing one should be certain is that given the predicable behavior of SAS, you know what you are getting.

According this thread,

one can manipulate data step to do what they want, regardless what they want makes any common sense or not.

my 2cents,

Haikuo

Super Contributor
Posts: 543

Re: Many to Many Merge

Understood.

Smiley Wink

Ask a Question
Discussion stats
  • 16 replies
  • 506 views
  • 3 likes
  • 6 in conversation