BookmarkSubscribeRSS Feed
robertrao
Quartz | Level 8


Hi Team,

I am doing a many to many merge and end up with the wrong results. Is there any way we can get it right without using the PROC SQL!!!

Regards

16 REPLIES 16
Linlin
Lapis Lazuli | Level 10

post some sample data.

robertrao
Quartz | Level 8

While merging i used IF A   !!!

For each ID the symbol variable spans 35 times in the first dataset. BUT the disease code for that patient ranges from 0-35 meaning he might not have a code or has 35 of the codes.

I am merging this with dataset B which has ID common But has more than 35 count for ID(may be a thousand 101's) because of the splitting of the dollar amount

Also in the Dataset B the dollar amount is broken down for each patient

He might also have less than 35 records

Dataset A

ID          symbol        code

101       1                    abc

101        2                   def

101         3

101         4

..

101         35   

102

102

102

Dataset B

ID         total       broken dollar amt(to satisfy 500 in all 101's)

101      500            20

101     500             15

101     500            100

101      500           so on which totals 500

101      500

101     500

101    500

101    500

101   500

101   500

IF HE HAS ONLY 11 Records in the B dataset he is getting the dollar amount on the 11th record retained 35 times because of the first dataset having 35 observations for each patient

When there are 11 in the Dataset B I want only those 11 and not 35

Regards

AncaTilea
Pyrite | Level 9

Hi.

You cannot do a many-to-many merge via data steps if you only have one common key variable (in your case, ID) because SAS does not know how to "match" the say 11 records from B to the 35 records in A?

So you need at least another "key" variable that will help you identify uniquely each record in your data set (s).

For example, it would be very helpful if the dollar amt breakdown would have a code associated with it. Then you could sort both data sets by ID Code, then merge them by ID Code.

PROC SQL would work very well on your problem.:smileycool:

Haikuo
Onyx | Level 15

I would categorize Hash() as part of Datastep implementation, and with Hash() and enough memory, I can hardly see anything that Proc SQL can do while data step can't.

Just my 2 cents,

Haikuo

robertrao
Quartz | Level 8

Hi,

Thanks for the detailed information. I would go through the thread and get back with you if I have any more questions.

Regards

Haikuo
Onyx | Level 15

As long as you have large enough RAM to host one of two merging datasets, Hash() will be among most efficient:

data h1;

  do id=1 to 3;

  do var1=1 to 2;

output;

end;

  end;

  run;

  data h2;

  do id=1 to 3;

do var2=3 to 5;

output;

end;

end;

run;

data want;

  if _n_=1 then do;

  if 0 then set h2;

declare hash h2(dataset:'h2', multidata:'y');

h2.definekey('id');

h2.definedata(all:'y');

h2.definedone();

  end;

  set h1;

  rc=h2.find();

do rc=0 by 0 while (rc=0);

output;

rc=h2.find_next();

end;

drop rc;

run;

proc print;run;

Haikuo

Haikuo
Onyx | Level 15

if I am thinking what you are thinking, you probably need something from this thread: (Pay attention to Ksharp's solution):

Haikuo

robertrao
Quartz | Level 8

Hi,

Does this NOte means we are doing something wrong???

NOTE: MERGE statement has more than one data set with repeats of BY values????

Haikuo
Onyx | Level 15

I won't worry about it if you do have data sets like that. It basically says that both of your data sets have by variables that are duplicated. 

Haikuo

robertrao
Quartz | Level 8

ok

I will put it this way.

I said if A and A dataset has 5000 observations.

I did a merge of A and B and final dataset has 5000 observations?? and I get that NOTE

Do I need to worry abt it??

Regards

Haikuo
Onyx | Level 15

Karun,

At this stage, I think it is better for you to lay out what exact outcome you are expecting, preferably with  sample tables( both incoming and outcoming).  Then we can start from there.

Haikuo

AncaTilea
Pyrite | Level 9

The NOTE means that SAS does not know how to "match" the records in the two data sets.

One should never ignore that note and instead perform a correct merge.

Haikuo
Onyx | Level 15

Hi AncaTilea,

I have to disagree. The NOTE is merely a reminder: " Wait a sec, are you sure this is what you want?", SAS does KNOW how to proceed under this scenario, and it does it with decent consistency. The only thing one should be certain is that given the predicable behavior of SAS, you know what you are getting.

According this thread,

one can manipulate data step to do what they want, regardless what they want makes any common sense or not.

my 2cents,

Haikuo

AncaTilea
Pyrite | Level 9

Understood.

Smiley Wink

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 2819 views
  • 3 likes
  • 6 in conversation