09-07-2012 01:05 PM
I am having issues trying to merge two datasets based on a common variable X. Dataset A has 10 records. 5 have a value of X=1, the other obs. have X=2.
Dataset B has 3 values where x=1 and say, 3 values where x=6. I only want records that match on X or are in Dataset A to begin with, so the 3 obs in dataset b
would be discarded.
I want to have 8 observations total, I am getting 10 by using the following code..
merge a (in=ina) b (in=inb);
if ina and inb then do;
output c; /* The originial record */
charge = pmt_amt; /* Carryover the needed vars in the record from B */
code = 'XX'; /* Mark as carryover */
else if ina then output c; /* Always keep Dataset A record */
What seems to be happening is the last matching record (#3) in B gets output multiple times so I get 10 records instead of 8 in this example. Why is
the flag inb set to 1 mutliple times on the last record in B? I am stumped and am facing a deadline. There must be a way to do this!
09-07-2012 01:28 PM
I'm not sure I fully understand, because I read "Dataset A has 10 records" and "I only want records that match on X or are in Dataset A to begin with". As there are 10 records "in Dataset A to begin with" this implies to me the output data set would have 10 records, as opposed to 8. Perhaps you could clarify which 2 records you think you should not have.
One thing to check is that you are outputting two records for each condition "if ina and inb" via the output statement, is this part of the problem?
09-07-2012 01:35 PM
My bad. I want the following:
1) All the records in A (10)
2) The matching records in B (3)
Desired Total 13
I am getting 15 with the last matching record in B being duplicated. I suspect the second output statement is causing the problem but I don't know how to get around this?
09-07-2012 02:01 PM
Sounds like you want a SET instead of MERGE. Are the values of variables other than the key (X) that you want merged from B onto the records from A? If not then you definitely want to SET the datasets together. Then the issue is how to determine if the particular value of X was in dataset A. Note that code below requires A is specified before B in the SET statement.
set a (in=ina) b (in=inb);
retain keepx ;
if first.x then keepx=ina;
09-07-2012 02:56 PM
Tom's data step approach is no doubt the optimal solution, here is just to show SQL's way to stack:
proc sql noprint;
create table c as
(select * from a)
(select * from b where x in (select x from a));