DATA Step, Macro, Functions and more

Correct way to merge/output records

Reply
Contributor
Posts: 32

Correct way to merge/output records

I am having issues trying to merge two datasets based on a common variable X.  Dataset A has 10 records.  5 have a value of X=1, the other obs. have X=2.

Dataset B has 3 values where x=1 and say, 3 values where x=6.  I only want records that match on X or are in Dataset A to begin with, so the 3 obs in dataset b

would be discarded.

I want to have 8 observations total, I am getting 10 by using the following code..

Data C;

    merge a (in=ina) b (in=inb);

    by x;

    if ina and inb then do;

       output c;         /* The originial record */

       charge = pmt_amt; /* Carryover the needed vars in the record from B */

       code = 'XX';      /* Mark as carryover */

       output c;

    end;

    else if ina then output c;  /* Always keep Dataset A record */


What seems to be happening is the last matching record (#3) in B gets output multiple times so I get 10 records instead of 8 in this example.  Why is

the flag inb set to 1 mutliple times on the last record in B?  I am stumped and am facing a deadline.  There must be a way to do this!

Super Contributor
Posts: 282

Re: Correct way to merge/output records

Hi,

I'm not sure I fully understand, because I read "Dataset A has 10 records" and "I only want records that match on X or are in Dataset A to begin with". As there are 10 records "in Dataset A to begin with" this implies to me the output data set would have 10 records, as opposed to 8. Perhaps you could clarify which 2 records you think you should not have.

One thing to check is that you are outputting two records for each condition "if ina and inb" via the output statement, is this part of the problem?

Regards,

Amir.

Contributor
Posts: 32

Re: Correct way to merge/output records

My bad.  I want the following:

1) All the records in A (10)

2) The matching records in B (3)

Desired Total 13

I am getting 15 with the last matching record in B being duplicated.  I suspect the second output statement is causing the problem but I don't know how to get around this?

Respected Advisor
Posts: 4,934

Re: Correct way to merge/output records

Then you could do :

data want(drop=ok);

do until(last.x);

     set a(in=ina) b; by x;

     ok = max(ina,ok);

     if ok then output;

     end;

run;

PG

PG
Super User
Super User
Posts: 7,076

Re: Correct way to merge/output records

Sounds like you want a SET instead of MERGE.  Are the values of variables other than the key (X) that you want merged from B onto the records from A?  If not then you definitely want to SET the datasets together.  Then the issue is how to determine if the particular value of X was in dataset A.  Note that code below requires A is specified before B in the SET statement.

Data C;

  set a (in=ina) b (in=inb);

  by x;

   retain keepx ;

   if first.x then keepx=ina;

   if keepx;

run;

Contributor
Posts: 32

Re: Correct way to merge/output records

Tom's solution looks like what I want - thanks.

Respected Advisor
Posts: 3,156

Re: Correct way to merge/output records

Tom's data step approach is no doubt the optimal solution, here is just to show SQL's way to stack:

proc sql noprint;

create table c as

(select * from a)

union all

(select * from b where x in (select x from a));

quit;

Haikuo

Ask a Question
Discussion stats
  • 6 replies
  • 582 views
  • 0 likes
  • 5 in conversation