Hi Team,
I am doing a many to many merge and end up with the wrong results. Is there any way we can get it right without using the PROC SQL!!!
Regards
post some sample data.
While merging i used IF A !!!
For each ID the symbol variable spans 35 times in the first dataset. BUT the disease code for that patient ranges from 0-35 meaning he might not have a code or has 35 of the codes.
I am merging this with dataset B which has ID common But has more than 35 count for ID(may be a thousand 101's) because of the splitting of the dollar amount
Also in the Dataset B the dollar amount is broken down for each patient
He might also have less than 35 records
Dataset A
ID symbol code
101 1 abc
101 2 def
101 3
101 4
..
101 35
102
102
102
Dataset B
ID total broken dollar amt(to satisfy 500 in all 101's)
101 500 20
101 500 15
101 500 100
101 500 so on which totals 500
101 500
101 500
101 500
101 500
101 500
101 500
IF HE HAS ONLY 11 Records in the B dataset he is getting the dollar amount on the 11th record retained 35 times because of the first dataset having 35 observations for each patient
When there are 11 in the Dataset B I want only those 11 and not 35
Regards
Hi.
You cannot do a many-to-many merge via data steps if you only have one common key variable (in your case, ID) because SAS does not know how to "match" the say 11 records from B to the 35 records in A?
So you need at least another "key" variable that will help you identify uniquely each record in your data set (s).
For example, it would be very helpful if the dollar amt breakdown would have a code associated with it. Then you could sort both data sets by ID Code, then merge them by ID Code.
PROC SQL would work very well on your problem.:smileycool:
I would categorize Hash() as part of Datastep implementation, and with Hash() and enough memory, I can hardly see anything that Proc SQL can do while data step can't.
Just my 2 cents,
Haikuo
Hi,
Thanks for the detailed information. I would go through the thread and get back with you if I have any more questions.
Regards
As long as you have large enough RAM to host one of two merging datasets, Hash() will be among most efficient:
data h1;
do id=1 to 3;
do var1=1 to 2;
output;
end;
end;
run;
data h2;
do id=1 to 3;
do var2=3 to 5;
output;
end;
end;
run;
data want;
if _n_=1 then do;
if 0 then set h2;
declare hash h2(dataset:'h2', multidata:'y');
h2.definekey('id');
h2.definedata(all:'y');
h2.definedone();
end;
set h1;
rc=h2.find();
do rc=0 by 0 while (rc=0);
output;
rc=h2.find_next();
end;
drop rc;
run;
proc print;run;
Haikuo
Hi,
Does this NOte means we are doing something wrong???
NOTE: MERGE statement has more than one data set with repeats of BY values????
I won't worry about it if you do have data sets like that. It basically says that both of your data sets have by variables that are duplicated.
Haikuo
ok
I will put it this way.
I said if A and A dataset has 5000 observations.
I did a merge of A and B and final dataset has 5000 observations?? and I get that NOTE
Do I need to worry abt it??
Regards
Karun,
At this stage, I think it is better for you to lay out what exact outcome you are expecting, preferably with sample tables( both incoming and outcoming). Then we can start from there.
Haikuo
The NOTE means that SAS does not know how to "match" the records in the two data sets.
One should never ignore that note and instead perform a correct merge.
Hi AncaTilea,
I have to disagree. The NOTE is merely a reminder: " Wait a sec, are you sure this is what you want?", SAS does KNOW how to proceed under this scenario, and it does it with decent consistency. The only thing one should be certain is that given the predicable behavior of SAS, you know what you are getting.
one can manipulate data step to do what they want, regardless what they want makes any common sense or not.
my 2cents,
Haikuo
Understood.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.