I am having issues trying to merge two datasets based on a common variable X. Dataset A has 10 records. 5 have a value of X=1, the other obs. have X=2.
Dataset B has 3 values where x=1 and say, 3 values where x=6. I only want records that match on X or are in Dataset A to begin with, so the 3 obs in dataset b
would be discarded.
I want to have 8 observations total, I am getting 10 by using the following code..
Data C;
merge a (in=ina) b (in=inb);
by x;
if ina and inb then do;
output c; /* The originial record */
charge = pmt_amt; /* Carryover the needed vars in the record from B */
code = 'XX'; /* Mark as carryover */
output c;
end;
else if ina then output c; /* Always keep Dataset A record */
What seems to be happening is the last matching record (#3) in B gets output multiple times so I get 10 records instead of 8 in this example. Why is
the flag inb set to 1 mutliple times on the last record in B? I am stumped and am facing a deadline. There must be a way to do this!
Hi,
I'm not sure I fully understand, because I read "Dataset A has 10 records" and "I only want records that match on X or are in Dataset A to begin with". As there are 10 records "in Dataset A to begin with" this implies to me the output data set would have 10 records, as opposed to 8. Perhaps you could clarify which 2 records you think you should not have.
One thing to check is that you are outputting two records for each condition "if ina and inb" via the output statement, is this part of the problem?
Regards,
Amir.
My bad. I want the following:
1) All the records in A (10)
2) The matching records in B (3)
Desired Total 13
I am getting 15 with the last matching record in B being duplicated. I suspect the second output statement is causing the problem but I don't know how to get around this?
Then you could do :
data want(drop=ok);
do until(last.x);
set a(in=ina) b; by x;
ok = max(ina,ok);
if ok then output;
end;
run;
PG
Sounds like you want a SET instead of MERGE. Are the values of variables other than the key (X) that you want merged from B onto the records from A? If not then you definitely want to SET the datasets together. Then the issue is how to determine if the particular value of X was in dataset A. Note that code below requires A is specified before B in the SET statement.
Data C;
set a (in=ina) b (in=inb);
by x;
retain keepx ;
if first.x then keepx=ina;
if keepx;
run;
Tom's solution looks like what I want - thanks.
Tom's data step approach is no doubt the optimal solution, here is just to show SQL's way to stack:
proc sql noprint;
create table c as
(select * from a)
union all
(select * from b where x in (select x from a));
quit;
Haikuo
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.