Re: Many to Many Merge

robertrao · Posted 10-30-2012 01:33 PM

Hi Team,

I am doing a many to many merge and end up with the wrong results. Is there any way we can get it right without using the PROC SQL!!!

Regards

Linlin · Posted 10-30-2012 01:37 PM

post some sample data.

robertrao · Posted 10-30-2012 02:04 PM

While merging i used IF A !!!

For each ID the symbol variable spans 35 times in the first dataset. BUT the disease code for that patient ranges from 0-35 meaning he might not have a code or has 35 of the codes.

I am merging this with dataset B which has ID common But has more than 35 count for ID(may be a thousand 101's) because of the splitting of the dollar amount

Also in the Dataset B the dollar amount is broken down for each patient

He might also have less than 35 records

Dataset A

ID symbol code

101 1 abc

101 2 def

101 3

101 4

..

101 35

102

Dataset B

ID total broken dollar amt(to satisfy 500 in all 101's)

101 500 20

101 500 15

101 500 100

101 500 so on which totals 500

101 500

IF HE HAS ONLY 11 Records in the B dataset he is getting the dollar amount on the 11th record retained 35 times because of the first dataset having 35 observations for each patient

When there are 11 in the Dataset B I want only those 11 and not 35

Regards

AncaTilea · Posted 10-30-2012 02:27 PM

Hi.

You cannot do a many-to-many merge via data steps if you only have one common key variable (in your case, ID) because SAS does not know how to "match" the say 11 records from B to the 35 records in A?

So you need at least another "key" variable that will help you identify uniquely each record in your data set (s).

For example, it would be very helpful if the dollar amt breakdown would have a code associated with it. Then you could sort both data sets by ID Code, then merge them by ID Code.

PROC SQL would work very well on your problem.:smileycool:

Haikuo · Posted 10-30-2012 02:40 PM

I would categorize Hash() as part of Datastep implementation, and with Hash() and enough memory, I can hardly see anything that Proc SQL can do while data step can't.

Just my 2 cents,

Haikuo

robertrao · Posted 10-30-2012 02:54 PM

Hi,

Thanks for the detailed information. I would go through the thread and get back with you if I have any more questions.

Regards

Haikuo · Posted 10-30-2012 01:43 PM

As long as you have large enough RAM to host one of two merging datasets, Hash() will be among most efficient:

data h1;

do id=1 to 3;

do var1=1 to 2;

output;

end;

run;

data h2;

do id=1 to 3;

do var2=3 to 5;

output;

end;

run;

data want;

if _n_=1 then do;

if 0 then set h2;

declare hash h2(dataset:'h2', multidata:'y');

h2.definekey('id');

h2.definedata(all:'y');

h2.definedone();

end;

set h1;

rc=h2.find();

do rc=0 by 0 while (rc=0);

output;

rc=h2.find_next();

end;

drop rc;

run;

proc print;run;

Haikuo

Haikuo · Posted 10-30-2012 02:36 PM

if I am thinking what you are thinking, you probably need something from this thread: (Pay attention to Ksharp's solution):

Haikuo

robertrao · Posted 10-30-2012 03:18 PM

Hi,

Does this NOte means we are doing something wrong???

NOTE: MERGE statement has more than one data set with repeats of BY values????

Haikuo · Posted 10-30-2012 03:21 PM

I won't worry about it if you do have data sets like that. It basically says that both of your data sets have by variables that are duplicated.

Haikuo

robertrao · Posted 10-30-2012 03:43 PM

ok

I will put it this way.

I said if A and A dataset has 5000 observations.

I did a merge of A and B and final dataset has 5000 observations?? and I get that NOTE

Do I need to worry abt it??

Regards

Haikuo · Posted 10-30-2012 03:47 PM

Karun,

At this stage, I think it is better for you to lay out what exact outcome you are expecting, preferably with sample tables( both incoming and outcoming). Then we can start from there.

Haikuo

AncaTilea · Posted 10-30-2012 03:32 PM

The NOTE means that SAS does not know how to "match" the records in the two data sets.

One should never ignore that note and instead perform a correct merge.

Haikuo · Posted 10-30-2012 03:45 PM

Hi AncaTilea,

I have to disagree. The NOTE is merely a reminder: " Wait a sec, are you sure this is what you want?", SAS does KNOW how to proceed under this scenario, and it does it with decent consistency. The only thing one should be certain is that given the predicable behavior of SAS, you know what you are getting.

According this thread,

one can manipulate data step to do what they want, regardless what they want makes any common sense or not.

my 2cents,

Haikuo

AncaTilea · Posted 10-30-2012 03:51 PM

Understood.

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away